Google 



This is a digital copy of a book that was preserved for generations on library shelves before it was carefully scanned by Google as part of a project 

to make the world's books discoverable online. 

It has survived long enough for the copyright to expire and the book to enter the public domain. A public domain book is one that was never subject 

to copyright or whose legal copyright term has expired. Whether a book is in the public domain may vary country to country. Public domain books 

are our gateways to the past, representing a wealth of history, culture and knowledge that's often difficult to discover. 

Marks, notations and other maiginalia present in the original volume will appear in this file - a reminder of this book's long journey from the 

publisher to a library and finally to you. 

Usage guidelines 

Google is proud to partner with libraries to digitize public domain materials and make them widely accessible. Public domain books belong to the 
public and we are merely their custodians. Nevertheless, this work is expensive, so in order to keep providing tliis resource, we liave taken steps to 
prevent abuse by commercial parties, including placing technical restrictions on automated querying. 
We also ask that you: 

+ Make non-commercial use of the files We designed Google Book Search for use by individuals, and we request that you use these files for 
personal, non-commercial purposes. 

+ Refrain fivm automated querying Do not send automated queries of any sort to Google's system: If you are conducting research on machine 
translation, optical character recognition or other areas where access to a large amount of text is helpful, please contact us. We encourage the 
use of public domain materials for these purposes and may be able to help. 

+ Maintain attributionTht GoogXt "watermark" you see on each file is essential for in forming people about this project and helping them find 
additional materials through Google Book Search. Please do not remove it. 

+ Keep it legal Whatever your use, remember that you are responsible for ensuring that what you are doing is legal. Do not assume that just 
because we believe a book is in the public domain for users in the United States, that the work is also in the public domain for users in other 
countries. Whether a book is still in copyright varies from country to country, and we can't offer guidance on whether any specific use of 
any specific book is allowed. Please do not assume that a book's appearance in Google Book Search means it can be used in any manner 
anywhere in the world. Copyright infringement liabili^ can be quite severe. 

About Google Book Search 

Google's mission is to organize the world's information and to make it universally accessible and useful. Google Book Search helps readers 
discover the world's books while helping authors and publishers reach new audiences. You can search through the full text of this book on the web 

at |http: //books .google .com/I 



t 




^ 






{ 



t 



STUDIES IN ECONOMICS AND POLITICAL SCIENCE 

Edited by Prof. W. A. S. HEWINS, M.A., 

Director^ London School of Economics. 



ELEMENTS 



OF 



STATISTICS 



ELEMENTS 



OF 



STATISTICS 



BY 



ARTHUR L. BOWLEY, M.A., RS.S., 

Lecturer in Statistics at the London School of Economics 

and Political Science ; 

Author of " Wages in the United Kingdom in the Nineteenth Century ^^ 



THIRD EDITION. 



Xonoon: 
P. S. KING & SON, 

ORCHARD HOUSE, WESTMINSTER. 

Vlcvv l^orli; charles scribner's sons. 

1907. 









PREFACE. 



This book is based on lectures given at the London 
School of Economics and Political Science in the five 
years following its foundation in 1895. There seems to 
be no text-book in English dealing directly and com- 
pletely with the common methods of statistics. English 
writings on the various branches of the science are for 
the most part in the form of articles in the journals of 
learned societies. Professor Mayo Smith in his Statistics 
and Sociology proceeds almost at once to historical 
applications ; while in Professor Meitzen's Geschichte, 
Theorie, und Technik d^r Statistik, issued in English 
by the American Academy of Political and Social 
Science, so much space is devoted to the history of 
the development of statistics, and the book is so slight, 
in comparison with the wide field it covers, that many 
elementary methods are treated very cursorily. In the 
excellent books in French, German, and Italian on this 
subject there is a general tendency to deal at length 
with the history of official statistics, the limits of the 
science, and particular applications of the theory of pro- 
bability, to the exclusion of more general matter; so that 
a student must refer to the works of Dr Mayr, Professor 
Westergaard^ Professor Lexis, Professor Gabaglio, M. 
Block, and Dr Bertillon before he is completely ac- 

285758 



Vl PREFACE. 

quainted with the elementary methods of statistics. The 
result is that there is no compact statement of principles 
acknowledged by statisticians, of the methods common to 
most branches of statistical work, of the artifices developed 
for handling and simplifying the raw material, and of the 
mathematical theorems by the use of which the results of 
investigations may be interpreted. This book forms an 
attempt to supply this want, so far as can be done without 
undue length. No place has been given in it to the 
history of statistics, and it does not contain any summary 
of the main groups of statistics extant ; several tables, 
drawn from a wide range of subjects, are given, but only 
to illustrate particular methods, and their choice has been 
determined by their suitability for this purpose. In the 
chapter on Collection of Material some account is offered 
of the genesis of the most important English statistics : 
the great part of the figures tabulated in the Statistical 
Abstract can be traced back to the householder s schedule 
of the Population Census or the custom house returns of 
foreign trade, while the chief statistics accessible for the 
study of modern social questions have come from the 
Wage Census of 1886 or are collected by the Labour 
Department : it is hoped that the account of these four 
groups of figures will afford some help in judging of 
their accuracy and limitations. Considerable space has 
been allotted to the subjects of Averages and Diagrams, 
because their use is universal, and, while their principles 
and technique are simple, their application is often mis- 
understood. The chapter on Accuracy is based on the 
Newmarch Lectures of 1897, and may perhaps be found 
to contain something that is new, a claim which is not 
made for the great part of the book. The treatment 



PREFACE. Vll 

throughout is intended to be suitable for those whose 
mathematics have not been carried to any height or have 
become rusty from disuse. With this view, when mathe- 
matical symbols were unavoidable, the preliminary 
hypotheses have been first discussed without algebraic 
notation and at some length, and those proofs have been 
chosen which require the minimum mathematical know- 
ledge rather than those which lead most directly to the 
result. Thus the most important results of the Theory 
of Error have been obtained without the use of the Differ- 
ential or Integral Calculus, and it is hoped that the greater 
part even of the chapter on Correlation will be intelligible 
to those who are not so well equipped as the Major- 
General in the Pirates of Penzance. Part II. is in- 
tended to be introductory and is certainly incomplete ; the 
normal law of frequency is the only one discussed, and 
the correlation of three variables is untouched. The more 
advanced treatment of this part of the subject is likely 
to be of interest to but few, who will have little trouble 
in obtaining the books and journals in which the further 
development may be found. Short bibliographies are 
added to the chapter on Interpolation and to Part II. for 
this purpose. It is hoped that this elementary handling 
may be of use to some who are interested in the statistical 
arguments based on the Laws of Probability, and that 
the definitions, formulae, and proofs given may save others 
from the necessity of searching in books, long out of 
print, for elementary theorems and deductions. The 
treatment in Part II., Section II., is peculiar in that it 
leaves very much in the background the Method of Least 
Squares ; the phrase, useful in some connections, seems 
to make the application of the Law of Error to statistics 



viil PREFACE. 

unnecessarily complex. I am much indebted to Professor 
Edgeworth, who has not only given me continued help 
both privately and by his publications in the study of the 
mathematical treatment of statistics, but has also read 
Part II. in proof and suggested many useful and important 
alterations. My thanks are also due to Professor Everett 
and Mr W. F. Sheppard for help in the chapter on 
Interpolation, and to Mr C. P. Sanger and Mr H. Clissold 
for reading great parts of the book in proof. 

A. L. B. 

London School of Economics, 
January 1901. 



NOTE TO SECOND EDITION. 

It has not been possible, even if it were advisable, to introduce 

any considerable changes in this edition. The text has been 

revised and notes have been added where the meaning had 

proved obscure. Pages 117, 118, relating to the importance of 

weights, have been re-written, and the canon that errors in 

weighting are generally unimportant has been explained and 

qualified. The measurement of dispersion has been developed 

on pages 126 and 136, and an Appendix has been added dealing 

with the asymmetrical curve of error on the lines laid down by 

Professor Edgeworth. It is hoped that the views of friendly 

critics have by these means been met, and that the book has 

been rendered more complete. 

A. L. B. 

September 1902. 

NOTE TO THIRD EDITION. 

Many of the diagrams have been brought up to date, and the 
corresponding text has been adapted, the bibliography on 
pp. 327-8 extended, and three minor misprints corrected. Other- 
wise the book is unaltered. 

A. L. B. 

April 1907. 



TABLE OF CONTENTS. 

PART 1. 

CHAr. rAGB 

I. Scope and Meaning of Statistics 3 

11. The General Method of Statistical Investigation - 17 

III. Illustrations of Method— 

^ Section I. The Population Census 2 

„ II. The Wage Census 33 

„ III. The Work of the Labour Department - - 41 

„ IV. The Statistics of England's Foreign Trade - 63 

IV. Tabulation : General— Mr Booth's Use of Census— Agricul- 

tural Earnings — U.S.A. Wage Statistics — Wage Census — 

Changes of Wages 73 

V. Averages : A. Simple ; B, Weighted ; C. The Mode ; 
D. The Median ; E. The Geometric Mean ; /^ Statistical 

Coefficients ; G. General 107 

VI. Use of Averages in Tabulation 133 

VI I. The Graphic Method- 
Section I. General Purpose 143 

,y II. Historic Diagrams - - - I59 

„ III. Comparisons of Series of Figures - - 168 

„ IV. Periodic Figures 178 

„ V. Logarithmic Curves .... i88 

VIII. Accuracy 199 

IX. Index-Numbers - 217 

X. Interpolation- 
Section I. General • - 233 

„ II. Algebraic Treatment •-.,.. 242 

PART II. 

Application of the Theory of Probability to Statistics- 
Section I. Introductory - 261 

„ II. The Equation of the Curve of Error - - - 269 

„ III. To what Groups does the Law of Error apply? - 293 

„ IV. The Permanence of Certain Small Numbers - - 301 

,. V. Extension of the Law of Error and Applications - 303 

„ VL The Theory of Correlation 316 

Appendix - 329 

Index 335 



I 



ERRATUM. 

Page 308, line i, for The modulus for the whole 300, readT\\^ modulus for 
the average of the 300. 



LIST OF DIAGRAMS 



PACING PAGE 

Graphic Method of Finding the Median, Quartiles, and Deciles - 128 

Graphic Representation of Wage Statistics - - - - 145 
Total Value of British and Irish Produce Exported from the United 

Kingdom, 1855-1906 - - ^ - - 151 

Graphic Method of Determining the Median and Modes - 155 

Revenue of the United Kingdom, 1 8 50- 1 905 . - . - 159 

Importation of Wheat and Wheat Flour, 1 862 -1 906 - 162 

History of the Cotton Trade, 1 854- 1906 - - - - 165 

Trade of British Possessions and Foreign Countries - - - 172 

Marriage Rate and Foreign Trade - - - - - 175 

Fluctuations of Employment, 1 85 5- 1 893 - - - 179 

Growth of Imports and Exports (Natural and Logarithmic Scale) - 190 

Marriage Rate and Employment (Logarithmic Scale) - - 193 

Symmetry of the Binomial Curve near its Greatest Ordinate - - Tf'j'j 

Table of Minima and Maxima Temperatures in 1898 - - 323 

Example of the Skew Curve of Error ----- 334 
The Curve of Error .._.-- At end 



ADDENDA AND CORRIGENDA TO THE 

THIRD EDITION. 



See page 337, after the Index. 






PART I. 



CHAPTER I. 
SCOPE AND MEANING OF STATISTICS. 



CHAPTER I. 

SCOPE AND MEANING OF STATISTICS. 

Very many definitions have been given of the word statistics^ 

and each author who has written on the subject has assigned new 

DefinitionB of Hmits to the field which should be included in its 

BtatistioB. scope. It will not be necessary for the purpose of 
this book to discuss the merely verbal differences involved, but 
only to explain what is intended by its title, and to consider 
the limits of the science which it is proposed to investigate. It 
will be useful, however, to mention some possible definitions. 

Statistics may, for instance, be called the science of counting. 

Counting appears at first sight to be a very simple operation, 

The soienoe of which any one can perform or which can be done 

counting. automatically ; but, as a matter of fact, when we 
come to large numbers, e.g., the population of the United King- 
dom, counting is by no means easy, or within the power of an 
individual ; limits of time and place alone prevent it being so 
carr'ed out, and in no way can absolute accuracy be obtained 
when the numbers surpass certain limits. Great numbers are 
not counted correctly to a unit, they are estimated ; and we might 

Duunotion Perhaps point to this as a division between arith- 
MtweenstatiBtioB metic and statistics, that whereas arithmetic attains 
and aritbmetio. exactness. Statistics deals with estimates, some- 
times very accurate, and often sufficiently so for their purpose, 
but never mathematically exact. Statistics generally relate to 
numbers so great that their estimation is beyond the power of 

statiBtioB ^" individual, and requires the co-operation of an 
asoo-operative organised body of workers. Though the collec- 

oonnting. ^.^^ ^^ numbers by several persons and the mere 
addition of the results seem simply questions of arithmetic, yet 
in practice two difficulties soon occur. First, it is not easy to 
define the thing to be counted so explicitly that all the tellers 
shall admit and reject instances on the same principles ; for such 



/' 



4 ELEMENTS OF STATISTICS. 

simple objects as the number of rooms or stories of a house, a 
person's age, even an individual, give rise to such complex ques- 
tions of definition that it is often impossible to tell from a short 
description of a category exactly what items are included in it. 
Secondly, numerical errors cannot be avoided when many 
workers are involved ; for some among a large number of 
persons will be inaccurate, some unintelligent, some will not 
obtain complete information, and when their reports are com- 
piled there will be occasional mistakes in copying and errors in 
tabulation. A total which is the result of the work of many 
hands will certainly from one cause or another fall short of 
complete accuracy. But though all estimates of this nature are 
sometimes included under the term statistics^ this definition at 
once is too wide, and also does not bring out the distinctive 
nature of statistical method. 

It is better, in fact, to define statistics a posteriori. In dealing 
with masses of figures, large numbers descriptive of groups, series 

statistics as a of totals or averages relating to different dates or 
metnod. places, it is found that special methods become 
necessary — methods which depend on particular properties of 
large numbers, methods which are suitable for describing com- 
plex groups so that they can be easily comprehended, methods 
for analysing the accuracy of statements, for measuring the 
significance of differences, for comparing one estimate with 
another. Those estimates to which these methods apply are 
within the scope of statistics ; it is the study of these methods 
that is the object of this book. It is clear that, under our 
tentative definition, statistics is not merely a branch of political 

Generality of ^conomy, nor is it confined to any one science. A 
statistical knowledge of statistics is like a knowledge of 
method. foreign languages or of algebra : it may prove of 
use at any time under any circumstances. 

It may be interesting to trace the connection of statistical 
method with various branches of knowledge. To begin 

itsuseintiie with the physical sciences: there are two points 
physical sciences, in which this method touches astronomy. The 
method of least squares was introduced by an astronomer, 
anxious to choose the best of several slightly discrepant observa- 
tions of the position of a star. In most physical observations 
several measurements are taken of the same quantity, and it is 
found that, however carefully they are made, they never absolutely 



ry 



SCOPE AND MEANING OF STATISTICS. S 

agree; just as the averages obtained by different statisticians 
from the same series of sociological observations are generally 
not identical. From such a group of measurements it is neces- 
sary to deduce the most probable estimates ; this is done by the 
application of the law of error, known as the method of least 
squares. 

The other point of resemblance of statistical to astronomical 

method is common also to geology and to most applied sciences. 

Progressive The course of scientific measurement has generally 

accuracy. j^^^j^ ^q ^^i^ first a rough observation of a quantity, 

such as the distance of the sun, the thickness of a stratum, the 
atomic weight of an element, the specific gravity of a substance ; 
then, as information accumulated, as the precision of instruments 
increased and methods were better adapted, to make the measure- 
ment gradually more and more accurate. It is important 
to appreciate this development, for in the present state of our 
knowledge, many statistical measurements cannot be made with 
precision for want of data, and a critic is inclined to say that for 
this reason preliminary estimates are valueless ; but from the 
scientific point of view this criticism is wrong, for a faulty 
measurement made on logical principles is better than none, 
and may lead to others with progressive improvement 

Passing by the general resemblance of statistical investigations 
to all scientific experiments, we may notice the use of statistics 
statistics and in biology. It was, perhaps, not recognised before 
Dioiogy. the publication of Professor Karl Pearson's inves- 
tigations,* that the whole doctrine of evolution and heredity 
rests in reality on a statistical basis. It is in this direction that 
the most important new work in statistics is being done. It may 
be worth while to sketch very briefly the nature of the problem. 
Out of a great number of observations, say the measurements of 
the heights of a group of men, the type is found — the average, 
about which all the measurements are grouped according to some 
definite law. The problem is then to determine whether this 
type or the grouping about it changes, and in what way. The 
differences found in successive generations form the data on 
which arguments as to evolution and development are founded. 
The method applies equally to fossil remains, to zoological 
species, and to many other groups. If it is neglected, many 

* See The Gramtnar of Science^ chap. x. j^^., and the references there given* 



6 ELEMENTS OF STATISTICS. 

valid arguments lose a great part of their force, and theories 
are founded on personal impressions of phenomena instead 
of on scientific measurement The work done in this 
direction becomes of immediate use to the student of social 
questions. The average wage and the grouping about it 
and the change in these quantities present precisely similar 
problems ; the change in the purchasing power of money is 
calculated by the same mathematical formulae ; in fact, these 
methods furnish the only accurate way of measuring numerical 
changes in complex groups. Much valuable information has 
been collected in anthropometrical laboratories, which has in- 
creased the statistician's knowledge of facts and given birth to 
important theoretical principles. 

Meteorology has much in common with statistics. The chief 
measurements taken for the purposes of this science are of 
statiBtios and temperature, barometrical pressure, moisture of the 
meteorology, ^ir, and force of the wind. One of the problems 
attacked is again that of finding the type from a group of 
observations, and of measuring its change. The tables which 
state the average temperature year by year are in many ways 
similar to those which the Registrar-General publishes of births, 
deaths, and marriages. Without the aid of statistical method, 
the averages obtained show mere numbers from which no logical 
deductions can be made. With the help of this knowledge, it 
can be seen whether the change from year to year is significant 
or accidental ; whether the figures show a progressive or periodic 
change; whether they obey any law or not. The problem is 
easily seen to be of importance for forecasting the future 
population and for many similar purposes. 

We are thus brought by a short step to the province to which 
statistics has sometimes been confined : the study of demography. 
statiBticsand If in demography we include, not merely the 
demography, measurement of the numbers of the population, 
the birth, marriage, and death rates, the distribution by age, by 
sex, and by locality, in fact, the figures which naturally come 
from the census and the Registrar-General's returns ; but include 
also, industrial and social measurements, of distribution of the 
population by trade, of income, wages, production, foreign trade 
transportation, and so forth ; we have extended the limits of 
demography till it includes the majority of the statistical 
investigations directly interesting to students of sociology 



SCOPE AND MEANING OF STATISTICS. J 

or of political economy. Without stopping to decide the 
exact limits of demography, we can quickly pass to another 
definition of statistics (so far as it concerns such students) on 
which it is wished to lay a certain stress : statistics is the science 
of the measurement of the social organism^ regarded as a whole^ in 

all its manifestations. In a monograph, after the 
to the sootoi fashion of Le Play, a single family is studied ; the 
^'^"hSe*'* occupations and earnings of its members, the way 

these earnings are spent, and its economic position 
generally are set down ; but this study is not so far statistical. 
In demography we study the same quantities when groups of 
families are concerned ; the number of families engaged in certain 
industries, and their average receipts, expenditure, and savings ; 
here we have statistics. In the monographic method the indi- 
vidual is everything ; in the statistical method, nothing. When 
we wish to obtain a measurement of the group, peculiarities 
of individuals receive no attention ; it is only when the same 
peculiarities are possessed by many persons that they become of 
importance. Statistics may rightly be called the science of 
averages. In the measurement of a complex group, say of 
incomes and wages, the exceptional artiste who can earn £\oo 
in an evening, and the inefficient labourer who can only make 
sixpence a day, affect only slightly the general average ; they 
are not entered in separate categories ; but the large group of 
skilled artisans who can earn over forty shillings a week, or of 
casual labourers who make less than fifteen shillings, are entitled 
to separate notice. The exact specification to be adopted is only 
a question of degree, which differs with the nature of the par- 
ticular investigation in hand. The object of a statistical estimate 
of a complex group is to present an outline, to enable the mind 
to comprehend with a single effort the significance of the whole. 
To do this it is necessary to exclude rigorously any presentation 
of details, for the same reason that, in a painter's rendering of a 
tree, the individual leaves are not distinguished. The outline 
will be a little blurred, a little inaccurate ; but it will be as 
distinct and detailed as the mind has power to grasp it, or the 
eye to see it ; the impression will be rightly given. There is a 
very important principle involved in this method. The individual 
members of a group vary continually, the whole group varies 
very slowly. It is impossible to follow or measure the motions 
of separate atoms ; it is comparatively easy to state the laws of 



8 ELEMENTS OF STATISTICS. 

motion for a solid body. Great numbers and the averages 
resulting from them, such as we always obtain in measuring 
social phenomena, have great inertia. The total population, the 
total income, the birth and death rates, average wages, change 
very little ; similar quantities relating to a single family change 
very fast. It is this constancy of great numbers that makes 
statistical measurement possible. It is to great numbers that 
statistical measurement chiefly applies. 

The relation of statistics to political economy is a simple 

one. Professor Marshall says,* " Statistics are the straw out of 

statiBtioiand which I, like every other economist, have to make 

poutioai the bricks." The statistician furnishes the political 

eoonomy. economist with the facts, by which he tests his 
theories or on which he bases them. Since the economist deals 
chiefly with phenomena relating to groups, and regards the 
individual only as a member of a group, it is to statistics as 
the science of averages that he looks for his information. When 
he is dealing with national economy, with the volume of trade, 
for instance, or the purchasing power of money, he is limited to 
pure theory, till statistics as the science of great numbers has 
provided the facts. The chemist experimenting in his laboratory 
is like the statistician ; the chemist theorising in his study is like 
the economist. Because of this relation it may be held to be the 
business of the statistician to collect, arrange, and describe, like 
a careful experimentist, but to draw no deductions ; even in an 
investigation relating to cause and effect, to present evidence but 
not conclusions. As a distinct operation, of course, the statistician 
may assume the r61e of the economist, for the same man may well 
be fitted to conduct the experiment and fit the theory. And just 
as a theoretical chemist will have little or no power unless he 
fully appreciates experimental methods and difficulties, even if 
he has not the manual dexterity to conduct them to perfection 
himself, so no student of political economy can pretend to com- 
plete equipment unless he is master of the methods of statistics, 
knows its difficulties, can see where accurate figures are possible, 
can criticise the statistical evidence, and has an almost instinctive 
perception of the reliance that he may place on the estimates 
given him. 

The proper function, indeed, of statistics is to enlarge indi- 

* Evidence to the Committee on the Census, 1890. 



SCOPE AND MEANING OF STATISTICS. 9 

vidual experience. An individual is limited to what he can 
statistics o«r.«. himself see, a very small part of one division of 
Individual the social organism ; his knowledge is extended in 
ezpenenoe. y^rious ways, by the conversation of his acquaint- 
ance, by newspaper reports, by the writings of experts. Accord- 
ing to his ability and power of judgment, he will be able to form 
a correct view of the numerical importance of groups of persons 
and things ; but it is in the highest degree improbable that he 
will not have been biassed by the peculiarities of his position, 
and that he will place his different items of information in the 
right perspective ; and he will not be able to gauge rightly the 
accuracy of his data. As soon as he begins to examine these 
points he is undertaking a statistical investigation, and will very 
soon find himself involved in all the difficulties and problems 
from which a knowledge of statistical method alone can dis- 
entangle him. This is the obvious answer to those who deny 
the use of statistics. A statistical estimate may be good or bad, 
accurate or the reverse ; but in almost all cases it is likely to be 
more accurate than a casual observer's impression, and in the 
nature of things can only be disproved by statistical methods. 

A chief practical use of statistics is to show relative import- 
ance, the very thing which an individual is likely to misjudge, 
statistics are Statistics are almost always comparative. The ab- 
oomparauve. solute magnitude of a quantity is of little meaning 
to us till we have some similar quantity with which to compare 
it. A statement of the number of paupers in the United King- 
dom is valueless unless we know the total population. A state- 
ment of the number of gallons of water supplied per head to the 
people of East London is of little meaning to us till we know 
the quantity supplied to other towns. The average wage, shown 
in the Wage Census, does not convey its full significance till we 
have similar computations for other countries or relating to other 
years. In the case of most statistical estimates, it will be found 
that we need another for comparison before we can appreciate 
the meaning of the first. 

If the group of objects which we wish to measure is large, 

its enumeration will be beyond our unassisted efforts, or those of 

Official any organisation at our command. Some investiga- 

Btatistics: tions, indeed, have been successfully conducted by 

private organisations, for instance, those which resulted in Mr 

Booth's Life and Labour of the People^ and Leone Levi's Wages and 



lO ELEMENTS OF STATISTICS. 

Earnings ; but in general the measurement of a part of the social 
body or industrial organism must be undertaken by the central or 
local governments, if it is to be successfully carried out. The fact 
that this is the case explains the heterogeneousness and imper- 
fection of the mass of statistics extant. A government naturally 
collects numerical information only in relation to its own func- 
tions. Thus the administration must know the numbers of the 
population and the area of the country in gross and in detail for 
its own purposes. Large groups of figures come simply from the 
necessity of public account-keeping. Many official figures are 
bye-products; for office purposes an account is kept of all 
transactions in which the government has a hand, and of in- 
dustries subject to special regulations ; and the government 
publishes most of the figures which thus come in its way. To 
such causes are due our knowledge of the statistics of income, 
education, imports, railways, mines, factories, and so on. Some 
such publications are only survivals from a former time, when the 
figures were directly needed, such as Gazette wheat prices (still 
used for the calculation of tithes), and to some extent, statistics 
of exports. Though few figures are collected simply for scientific 
purposes, yet in many cases schedules issued for administrative 
ends are used at the same time for the reception of other 
information, of use chiefly to the sociological student ; much of 
the Census information comes under this heading. A view of 
those figures, relating to the United Kingdom, which are easily 
accessible to the student, can be obtained by turning through 
the annual Statistical Abstract for the United Kingdom^ the 
Annual Abstract of Labour Statistics, and the Registrar-Generafs 
Annual Report ; in one or other of these, summaries of, and 
references to, most official statistics are to be found. 

It is clear that figures collected simply in connection with 
administrative purposes are not likely to be precisely those 
their which are needed by the student of sociology or 
inoompieteness. political economy. Even where the wants of the 
official and the student are nearly identical, the classification and 
tabulation may not meet scientific requirements. There has, 
indeed, been considerable progress in recent years, due one may 
suppose to the influence of Sir R. Giffen, in the direction of 
amassing statistical information not absolutely needed by the 
administration, and most of the work of the Labour Department y 

is of this kind ; but very much more might reasonably be done. 



SCOPE AND MEANING OF STATISTICS. II 

at an expense which would be almost negligible when considered 
in relation to the national income. Thus the census might be 
made, in part at least, quinquennial, and the body of workers, 
who are organized once in ten years to conduct it, only to be 
disbanded when the report is issued, might be made permanent 
and entrusted with the organization of a decennial industrial 
census. Market prices of many staple commodities could be 
tabulated by local officials in the same way as wheat sales 
are now registered. Movements of goods by rail could be 
tabulated in the same way as transport by water, and the 
anomaly that we know more of our foreign than of our home 
trade be removed. The production of factories might be re- 
turned as well as that of mines. A permanent government 
office might well be charged from time to time with special 
investigations, similar perhaps to the Wage Census of the Board 
of Trade. It needs very little study of statistics or of political 
economy, to feel the pressing need of some of this information, 
statistics speoi- Attention may be drawn to some of the gaps in our 
ally needed, knowledge. When dealing with our national income 
we can obtain statistics of wages, and of income subject to tax ; 
but for salaries below the exemption limit, and for part of the 
income received for foreign investments, we are forced to rely 
on educated guesses. For the change of the purchasing power 
of money we know, thanks chiefly to the Economist and trade 
newspapers, the course of wholesale prices, but many interesting 
calculations are brought to a standstill because of the complete 
dearth of records of retail prices. With regard to wages, we can 
estimate fairly accurately standard and average wages, but, in 
default of an industrial census, do not know how many persons 
are in receipt of each given wage, nor the relative numbers of 
masters and men. We know fairly well the mass of trade that 
leaves or reaches our shores, but as regards the far greater mass 
of our internal trade our ignorance is almost complete. Till 
there is a public demand for such information, it will need a very 
enlightened government to spare the time, trouble, and expense 
necessary for a systematic attempt to fill up these gaps ; but 
we can all do something towards this enlightenment, and in 
furtherance of this demand, by studying what has been done 
in other countries, and building up a knowledge of the science 
of statistical investigation. 

The absence of such a demand is perhaps due to a widely 



12 ELEMENTS OF STATISTICS. 

spread and not unreasonable distrust of statistical estimates^ 
Digtrurtof crystallized in the common remark that "anything 
statiBtics: (^an be proved by statistics." This is to a great 
extent the fault of the criticising public themselves : they are 
always requiring and the newspapers always supplying informa- 
tion, which depends on a statistical basis, but for which good 
statistics are not to be found for one or other of the reasons 

already indicated. The informant must perforce 

Itt oaiis6f . ^ 

turn to inaccurate estimates, and the public has no 
knowledge or discrimination as to what estimates rest on satis- 
factory data, or indeed as to what quantities are capable of 
statistical evaluation. Again, figures which cover only part of 
the subject, such as the Wage Census average, or the Labour 
Gazette returns of unemployed, may be quoted as universal; mere 
estimates, made for quite other purposes, may be given as 
accurate and complete ; and on such unreliable premises argu- 
ments are based, which naturally, by a judicious choice of 
material, can be made to support any theory at pleasure. It 
will generally be found that the statistician, on whose authority 
such statements are supposed to be based, is not to blame. 
Some of the common ways of producing a false statistical 
argument are to quote figures without their context, omitting 
the cautions as to their incompleteness, or to apply them to a 
group of phenomena quite different to that to which they in 
reality relate; to take estimates referring to only part of a 
group as complete; to enumerate the events favourable to an 
argument, omitting the other side ; and to argue hastily from 
effect to cause, this last error being the one most often fathered 
on to statistics. For all these elementary mistakes in logic, 
statistics is held responsible. 

Perhaps statisticians themselves have not always fully recog- 
nised the limitations of their work. At best they can measure 
Limitations of Only the numerical aspect of a phenomenon ; while 

statistics. very often they must be content with measuring 
not the facts they wish, but some allied quantity. We wish to 
know, for instance, the extent of poverty, its increase or diminu- 
tion : poverty we cannot define or measure, and we cannot even 
count the number of the poor; all we can do is to state the 
number of officially recognised paupers, and add perhaps some 
estimates from private sources ; but this gives us no clue to the 
intensity of poverty in individual cases. Or we wish to obtain 



SCOPE AND MEANING OF STATISTICS. I3 

statistics of health : all we can measure is the death-rate and 
average length of life, and the prevalence of some diseases, very 
different matters. The statistician's contribution to a sociological 
problem is only one of objective measurement, and this is fre- 
quently among the less important of the data ; it is as necessary, 
however, to its solution as accurate measurements are for the 
construction of a building. 



CHAPTER II. 

THE GENERAL METHOD OF STATISTICAL 

INVESTIGATION. 



J 

i 



I 



CHAPTER n. 

THE GENERAL METHOD OF STATISTICAL 

INVESTIGA TION. 

At first sight it will seem as if there were no method common 
to all statistical investigations, and indeed the processes differ 
so widely that it is not easy to outline a scheme which will 
include them all ; but the following sequence is generally 
indicated* as of general application, and will serve at least 
to thread an examination of methods together : (i) the Collection 
of Material, (2) its Tabulation, (3) the Summary, and (4) a 
Critical Examination of its results. These processes will be 
discussed in detail in the following chapters. 

It may be well to state what equipment is necessary for the 
student who wishes to learn statistical methods. In collection 

and tabulation common-sense is the chief requisite, 

knowledge ^^^ experience the chief teacher ; no more than 

necessary or a knowledge of the simplest arithmetic is neces- 

expedient r ^ 1 1 . « i« 

sary for the actual processes ; but since, as we shall 
see immediately, all the parts of an investigation are inter 
dependent, it is expedient to understand the whole before 
attempting to carry out a part. For summarising, it is well to 
have acquaintance with the various algebraic averages, and 
with enough geometry for the interpretation of simple curves, 
though all the operations can be performed without the use of 
algebraic symbols. For criticism of estimates and interpre- 
tation of results, it is necessary to use the formulae of more 
advanced mathematics, and it is obviously expedient to under- 
stand the methods by which these formulae are obtained to 
ensure their intelligent use. They are specially necessary for 
the comparison of complex groups, and for estimating the 
significance of a divergence from the average, or the deviations 
in a list of periodic figures. 

♦ See, e.g,y Dr Bertillon's Cours tUmentaire de StaHstiquCy to which the 
present author is indebted for some of the treatment in the following pages. 

B 



l8 ELEMENTS OF STATISTICS. 

(i.) Information is generally collected by issuing blank circulars, 

forms of inquiry, to be filled in either by a few officials or by many 

oouection: individuals, and the proper drawing up of this 

blank forms; fQj.jn jg q^^ q( the chief tasks in a good investiga- 
tion. Before this form is issued it is necessary to formulate 
a complete scheme of the whole undertaking, and even to have 
some idea of what the resulting figures will be, so as to be 
able to arrange the details of the organization on the right 
scale, and adjust the tools used to their purpose. As already 
pointed out, the object whose measurement is wanted is not in 
general exactly that which can be measured, and the measur- 
able quantity nearest to it must be found ; e.g-,, when the average 
annual earnings of the working class were in question, the 
quantity first measured was the average weekly wage. Then 
some technical knowledge of the particular subject is needed ; 
and, if not possessed, a preliminary inquiry on a small 
scale may be necessary to show how to fit means to ends. 
The people who possess the information required must be 
discovered and interrogated at first hand. The questions put 
must be those which will yield answers in a form ready for 

nature of the tabulation, and the scheme of tabulation must 
quesuons. therefore be thought out beforehand. The ques- 
tions must be so clear that a misunderstanding is impossible, 
and so framed that the answers will be perfectly definite, 
a simple number, or " yes " or " no." They must be such as 
cannot give offence, or appear inquisitorial, or lead to partisan 
answers, or suppression of part of the facts. The mean must 
be found between asking more than will be readily answered 
and less than is wanted for the purpose in hand. The form 
must contain necessary instructions, making mistakes difficult, 
but must not be too complex. The exact degree of accuracy 
required, whether the answers are to be correct to shillings or 
pence, to months or days, must be decided. Every word and 
every square inch of space must be keenly criticised. A 
little trouble spent upon the form will save much inconvenience 
afterwards. 

(2.) In considering what method is to be adopted for tabula- 
tion, we must remember that the investigation is intended to 

« w , X. furnish the answers to certain definite questions — 

Tabulation. 

how many people, what wage, what price — and each 
column must present some total which is relevant to these 



GENERAL METHOD OF STATISTICAL INVESTIGATION. I9 

questions. The exact scheme employed will differ in different 
inquiries. In the population census, the tabulation is almost 
automatic ; in the wage census, the best and simplest way to 
show the grouping about the average wage in each occupation 
had to be specially devised ; in trade statistics the number of 
different categories to be adopted and the limits of each raise 
difficult questions. In general, the scheme of investigation re- 
quires knowledge of certain groups ; and the totals resulting 
from tabulation should show the numbers of items in these, so 
that, after tabulation, instead of the chaotic mass of infinitely 
varying items, we have a definite general outline of the whole 
group in question. 

(3.) When the raw material is worked up to this point, skill ot 
a different kind is wanted. From the numbers obtained, we 
Averagiiigand have to pick out the significant figures; so to 
summariaatioiL present the totals and averages as to give a 
true impression to an inquirer; to summarise briefly the 
information obtained ; to concentrate the mass into a few 
significant averages, and to describe their exact meaning in 
the fewest and clearest words, for it is the result of this 
concentration which will generally be used and quoted. To 
do this skilfully requires an acquaintance with the method of 
averages and the use of diagrams. It may further be necessary 
to fill in unavoidable gaps in the figures in order to supply esti- 
mates for intermediate years; this needs a study of the dangerous 
method of interpolation. Finally, the verbal description of the 
process, its genesis and results, and an estimate of its accuracy 
must be added, and then the investigation is complete. 

(4.) The student who has to make use of statistics should not 
be content to take the results of an inquiry on authority, but 

oritioiBmof ought to acquaint himself with all these details of 

resiatB. method. Before the results can be criticised, it 
is necessary to know the complete genesis of the figures ; 
whether the whole field was covered ; exactly whence the 
information tabulated was obtained ; whether there was a 
possibility of bias; how nearly the individual answers were 
correct; whether the informants really knew the facts they 
related, and if they were likely to state them correctly. The 
published statement of the results should show clearly the 
whole scheme of collection so as to make this criticism possible ; 
in particular, specimens of the original blank forms should be 



20 ELEMENTS OF STATISTICS. 

included, so that the reader can judge whether the original 
answers correspond exactly to the form of tabulation employed. 
Internal evidence often leads to much useful criticism. It can 
be seen whether the number of returns for each group is 
proportional to its importance, or if a specially important 
figure depends on only slight evidence. The continuity of the 
figures can be examined, and the causes of sudden gaps in- 
vestigated. The returns can be divided into sample groups, 
and the extent of the correspondence of these groups to the 
general result will often indicate whether the returns are 
sufficiently general. A careful study of the more minute 
tabulations may show within what percentage the final numbers 
may be expected to be correct. 

The most important function of statistics is to produce 
evidence showing the relation of one group of phenomena to 
another ; for the information obtained is presumably intended 
as a guide for action, the guidance is generally needed to show 
what actions are likely to produce certain desired effects, and 
this is best investigated by finding how such effects have been 
produced in the past. We have then to determine whether 
changes in one measurable quantity {e.g., the duties on corn) 
have produced changes in another {e.g.^ the amount of pauperism); 
a problem generally insoluble, but one on which most light 
can be obtained by the study of the relevant statistics in the 
light of mathematics, the mathematics of probability, and it is 
in this particular branch of mathematics that recent statistical 
progress has been chiefly made. 

Such questions, however important, are somewhat abstruse, 
and presuppose a certain amount of technical knowledge which 
is not in the possession of the general student. The plan of 
this book is to postpone all questions requiring such technical 
or mathematical knowledge to the Second Part, and to confine 
our earlier discussions to problems needing no special training 
or equipment 



CHAPTER III. 
ILLUSTRATIONS OF METHOD. 



/ 



I 



the 




cise 

»po- 

the 

the 

rom 

rom 

not 

eral 

or 



CHAPTER III. 
ILLUSTRATIONS OF METHOD. 



Section i. — The Population Census. 

The population census will provide good illustrations of the 
principles laid down in the last chapter, both because we shall 

be at first on familiar ground, since every one knows 

its scheme, purpose, and details, and because the 

form of inquiry used for the collection of the original data 

brings out very pronainently the difficulties met with in detailed 

statistical investigations. 

The first thing to be considered is the exact object for 
which the census is undertaken. It is for demographical pur- 
poses ; to supply information as to the numbers and 
local distribution of the population, the numbers of 
each sex and age, their so-called civil condition (/.^., whether 
single, married, or widowed), and their nationality. This is the 
minimum information ne;cessary for administrative purposes. 
In addition to these facts there are very many others which the 
statesman and the economist wish to know about each member 
of the population, and the census form is the only means in 
England of collecting universal data ; the question as to which 
of these shall be investigated and which neglected, is decided 
The choice of more by expediency than on principle. Of these 
qnesuons. desiderata the following may be mentioned : the 
size and structure of the family, its position in the social scale, 
the economic position of its head ; the nature of employment 
of its members, the wage or income of each member and of the 
family as a whole, the rent and size of their house, their educa- 
tional condition, the ages at which they commenced or retired 
from work, their migrations, their combination in religious or 
other bodies, and their infirmities. It is clear that some of 
this information must be dispensed with, if the form is not to 
be overcrowded, and if the tabulation is to be finished in any 



24 ELEMENTS OF STATISTICS. 

reasonable time ; and an examination of the general nature 
of the questions which can suitably be put will show how the 
necessary selection is made. 

First, the questions must be those which the informant is 
able to answer. Now, if the questions were only to be put 

Ability to educated and methodical persons, doubtless a 
to answer. fyH account could be given of the family migra- 
tions and of the ages at which each member had been at work ; 
but the peculiarity of the census is that it is universal, and 
the questions must be such that the least educated and most 
unthrifty householder shall be able to answer ; in many cases 
such facts would have been unrecorded and forgotten. 

Secondly, the questions must be perfectly definite, so that 
there can be no doubt as to what the right answer should be. 

The only answers which are of value to the 
statistician are "yes," "no," or a simple number. 
Adjectives and adverbs such as many, often, partly, &c., bear 
different numerical meanings to different people, and, though 
they may express fairly clearly the position of an individual, 
are nearly useless for tabulation,* which is their only purpose 
so far as the census is concerned. Thus the question as to 
education would have to be, not " state whether well, moderately, 
or badly educated," but " state at what age school was left," or 
"how many years at school?" But even if such questions 
were not excluded by our first test, by the forgetfulness of the 
informant, the statements given would be of little practical value, 
and very often incorrect. An inquiry as to wage and income 
could not be made sufficiently definite without so many questions 
as to require a form to itself ; for wages, as we shall see when 
considering the Wage Census, require very careful definition, and 
many subsidiary questions must be put to get a proper estimate ; 
the simple query, "what is your weekly wage or annual income?" 
would be answered on so many varying principles that the result 
would be valueless. 

Thirdly, the questions must be such as will be answered 
truthfully and without bias There is hardly a demand on 

the census form which would not be excluded, if 

this rule was too rigorously enforced, as we shall 

see immediately. The worst offender in this respect is the 



* But see p. 138, infra* 



POPULATION CENSUS. 2$ 

question, Employer or employed? For though there are many 
cases in which a man is both employer and employed so that 
this question should be excluded by our second test, many 
persons consciously exaggerate their social importance by 
erroneously replying the former. Questions relating to social 
position must generally be excluded by this rule. 

Fourthly, the questions must be those which will be answered 

willingly, and must therefore not be inquisitorial, or such as 

Reinotanoe to to raise apprehension of a change of law or an 

answer. imposition of taxes. Questions as to membership 
of trade unions, or of friendly societies, or as to insurance, 
would be thought inquisitorial. Many would refuse to state 
their incomes, holding it to be no one's concern but their own. 
Questions as to rent might be regarded as possibly leading 
to taxation. Questions as to religion are badly answered, as 
was shown in the evidence before the Census Committee of 
i8go,* and should be excluded by each of these four rules. 
Some persons do not know what their religion should be 
named, others would find the question indefinite, others would 
deliberately answer wrongly, and many not at all. 

The questions on the census formf not excluded on one 
or other of these grounds are Nos. i, 2, 3, 4, 5, and 10; 
these are fairly definite, and householders are generally able 
and willing to give correct answers to them. Questions 6, 7, 
8, 9, and 11 compete with many others, which lead to equal 
inaccuracies, for a space on the census sheet. No. 6 has long 
held its place because of its great importance ; Nos. 7, 8, 
and 9 are on their trial. A further discussion of the merits 
of some of these is to be found in the Report of the Com- 
mittee already mentioned ; here it is only intended to indicate 
the general grounds of inclusion or exclusion. 

So far we have not discussed the important question as to 

who should fill in the form. If, as in the English Census, 

Pilling up of it is to be filled in by the householder, the ques- 

the form. ^ions must be much simpler in matter and words 
than if it is to be filled in by an official teller. In the latter 
case the form may be much more complicated, the questions 
more inquisitorial and such as might lead to indefinite answers 
on the part of ignorant people ; for the teller would insist on 

* Report oj Committee on the Census^ 1890 (C. — 6071). t Facing p. 23. 



26 ELEMENTS OF STATISTICS. 

an answer, be able to exclude those obviously wrong, and 
cross-question till the indefinite answers were so altered as to 
allow definite tabulation. In a great and complex undertaking 
like the Census, where many tellers must be impressed for a 
single day's work, their instructions and the general plan must 
be sufficiently simple ; but as the extent of an inquiry con- 
tracts, the tellers can receive more complete instructions, and 
the information requisitioned may be more complex. This is 
of most importance in connection with columns 6-9. 

The general shape and appearance of the sheet needs 
attention. If the structure of the family is to be shown, the 
siiape of blank answers are best given on a single sheet, which 
form. must contain enough lines for the largest ordinary 
household, so that the trouble of fastening together of many 
couples may be avoided, and tabulation not be hindered. The 
spaces must contain plenty of room for answers in uneducated 
handwriting, without making the whole so large as not to He 
easily on a desk. The instructions must be distinct and visible, 
and placed in close connection with the answers ; to further 
this, a skilful use may be made of capitals, italics, and different 
founts of type. On the form facing p. 23, those in use are 
roughly reproduced in miniature. 

The form should always show for what purpose the figures 
are collected, and how they will be used, in order to enlist the 
Purpose to be support of the informant and allay misapprehension, 
shown. Thg extent to which this should be done depends 
a good deal on whether the filling-up is compulsory, as in 
the population census, or voluntary, as in the wage census. In 
the case before us no preamble is necessary, since every one 
knows the main features of a census, and most are willing to 
further its objects ; but it must be shown that the inquiry is 
sanctioned by Parliament, and that compliance is compulsory. 
This is done on the back, on the fold which is outside before the 
form is opened ; and even though penalties are threatened 
against absence of or falsification of returns, the last sen- 
tence describes the object of the inquiry and guarantees the 
informant against malicious use of his answers. Where in- 
formation is voluntary, a careful letter should be printed and 
circulated with the form, persuading the informant to give his 
assistance. 

While the main part of the form is filled in by the house- 



POPULATION CENSUS. 27 

holder, other parts are filled in by the officials, and with very 
subaidiaiy Uttle trouble a good deal of subsidiary information 
inforxnation. ^an be collected in this way. On the outside the 
Parish, Town, Sanitary District, Street, and Number are endorsed, 
so that the answers can be tabulated for any of these districts. 
The teller could also, as he took the form, enter the number of 
stories to a house, which is not done in the English Census, and 
other information as to the style of house and street might be 
endorsed. In a more intensive investigation, Mr Charles Booth's 
assistants, for instance, could be trusted to come out of a house 
with an accurate knowledge of many interesting details. 

We can now proceed to the individual criticism of the form 
in the light of the rules suggested above. In the first place. 
Lines and even the arrangement of columns is not perfect. To 
ooinmns. labourers who are not in the habit of writing at all^ 
and who have (to judge from election posters) to be instructed 
how to put their mark in the right place on a ballot paper 
(many papers being destroyed simply through ignorance), this 
arrangement of horizontal and vertical columns would be con- 
fusing, and without help they would not gather at all what they 
were to do. They would fill up more easily a paper in which 
the answers were to follow the questions immediately : — 

State your Name 

State your Age 

State your Sex . 



Unmarried, Married, or Widowed 

and so on. 



This form, however, could only be used if a separate paper were 
to be filled in for every individual, children and all. Other 
elementary matters might be improved. On looking through the 
form a great number of words and phrases will be found which 
are not in common use, e,g,^ abode, dwelling (as a noun), else- 
where. East Indies, imbecile, " precise " infirmity, general term, 
column, the foregoing, condition as to marriage. In column i 
the phrase, " name and surname " reads as though surname were 
not a name, and perhaps the word " surname " is not in general 



28 ELEMENTS OF STATISTICS. 

use, SO that the printed word might be taken to mean title, and 
the confusing answer "none" written under it. Does the in- 
struction " write after " mean to the right, or below ? 

The first question, which for the general purpose of the 

census should be the most definite of all, leaves some room for 

oritioiBmofthe doubt. What of a night-watchman returning at 

questions. ^ ^jy^^^ or a printer at 2 A.M. ? What constitutes a 
traveller : does a man who leaves the house before midnight, or 

"Slept or a man who goes down to Brighton by the theatre 

abode." train come under the term ? Is midnight or 2 A.M. 
the critical time ? What of a person who dies at i A.M., or a 
birth at midnight ? How is the householder to know whether 
any of his establishment are returned elsewhere? Since too 
many instructions only lead to confusion, the tellers should be 
specially taught the answers to such questions. 

The very meaning of the phrase " population of a district " 
is open to much doubt. In France "la population de fait," 

Meaning of which consists of all present in the given district 

population, at the given moment, is distinguished from "la 
population de droit," which consists of all usually resident in the 
district, including those temporarily absent, and excluding those 
only momentarily present, and from " la population municipale," 
which is "la population de droit," less prisoners, hospital patients, 
scholars resident in schools, members of convents, the army, and 
so on.* The English Census counts "la population de fait." 
In the United States we find a "constitutional population," 
which excludes residents in Indian Reservations, the Terri- 
tories, and the District of Columbia ; the " general population," 
which includes in addition the Territories (except the Indian 
Reservation, Indian Territory, and Alaska) ; and the " total popu- 
lation," which includes all excluded in the former.f In the 
future questions will arise as to the inclusion of the Philippines 
and Cuba. Notice that the Channel Islands and the Isle of 
Man are included in the English Census. 

♦ See Bertillon, ibid,^ p. 146. 

t Willcox : Area and Population of the United States at the XL 
CensuSy a book which gives a very useful criticism of the accuracy of the 
most elementary data of statistics. It is a pity that space is wasted in a 
useless attempt to supplant the word " statistician," which has now a definite 
meaning, by the word " statist," which has another equally definite meaning. 
Does Dr Willcox wish to substitute " statics " for " statistics " ? 



POPULATION CENSUS. 29 

It is possible to find difficulties in filling up all the columns 
except No. 4. For illustration, consider how column 2 should 
be filled in in the case of a cousin who was a "paying guest," or 
a relation who was a visitor ; for column 3, is a divorced person 
single or a widower, and what of a woman who is doubtful 
whether her husband is lost at sea? Errors come from No. 3 
because many unmarried people call themselves married. 

It is well known that column 5 is wrongly filled in for two 
reasons — one, that elderly people often do not know their ages 

accurately and enter them to the nearest round 

Ago 

number, so that the returns congregate at 40, 50, 
60: the error thus arising is eliminated by tabulation in the 
groups 35-45, 45-55 years, &c., and for more minute tabulation 
the groups 3-7, 8-12, 13-17, &c., are suggested : the other is that 
many ladies habitually enter their ages too low ; in this case 
also the Registrar-General is able to deduce nearly correct 
totals. 

It is to be noticed that, since the ages stated are those 
*'last birthday," the age will on the average be given six 
months too low, and, in fact, the ages given as 17, e.^-.y should 
be scattered nearly uniformly over the months to the eighteenth 
year. 

The most important criticisms of the census-schedule are to 
be made on columns 6-9. It will not be expedient here to go 

into all the questions raised before the Committee 
on the Census as regards an industrial census. 
While there can be little doubt that a thorough census of occu- 
pations would be best undertaken separately, and on somewhat 
different principles from the population census, it is certainly 
better, till opinion is ripe for so radical a change, to include 
in the present census the best questions we can as to occupa- 
tions, than to omit them altogether in despair of accurate 
results. 

The objects aimed at, which we must always keep in mind 
when criticising special questions, are two : to find the number 
employed in each trade and industry, that is, so to say, to 
form vertical divisions ; and to find the number in each rank 
or grade of employment (labourer, artisan, employer, &c.) in 
horizontal divisions ; so that the tabulation may give some such 
result as — 



30 



ELEMENTS OF STATISTICS. 





Textile Industries. 








Cotton. 


Wool. 


Linen. 


Totals. 


Employers 

Managers 

Overlookers - 

Spinners 

Weavers 

Labourers 

Children 








• 


Totals 











The necessary minimum of information would be given by 
such answers as 

Legal — Solicitor — Managing clert 
Mining — Coal — H ewer. 
Metal-worker — Iron — Smith's sjtriker. 

Now the simple instruction, "State your occupation," would of 
course not lead to information of this sort. The coal-hewer 
would simply say miner ; the clerk, managing clerk ; the striker, 
very likely smith. To explain what is wanted and avoid mis- 
takes, the question is not put on the face of the form at all, but 
the informant is referred to the back, half of which is devoted to 
instructions relating to this column. These are lucid, carefully 
picked out with capitals and italics, comprehensive, brief and to 
the point. No one who wishes to fill in the form rightly, and is 
sufficiently educated to understand simple instructions, can easily 
go wrong. Yet, as a matter of fact, these instructions are in very 
many cases neither read nor followed ; and this fact is very im- 
portant in connection with the general study of blank forms of 
inquiry. Forms issued to people uninterested in the object in view 
will generally be filled in with the least possible expenditure of 
time and intelligence. Hence two courses are open : to reduce 



POPULATION CENSUS. 3 1 

the question to the simplest possible form, and make the best of 
the result; or not to allow the informants to write in their own 
answers, but to take them vtvd voce by means of a teller, who 
has mastered the instructions, and has the necessary legal force 
behind him to compel information. The latter course entails 
time and expense. 

The result of the present system of inquiry, combined with 
a faulty method of tabulation, which it to some extent makes 
necessary, is that we have no reliable census of occupations for 
the United Kingdom. The present figures break down both 
from faulty data and from insufficient tabulation directly we 
attempt to make any calculations depending on them. 

An attempt has been made to correct to some extent our 
ignorance of the relative numbers of unskilled and skilled 
The result of the labourers, employers and employed, by columns 
new questions. 7^ g, and 9. The headings are not a model of 
clearness; there is not the ordinary imperative "state", or 
" write," nor is one told on the front of the form whether to 
write Yes or No or to make a mark in the appropriate column, 
nor is the distinction between the three headings a perfectly 
definite one ; but still one is hardly prepared for the following 
statement in the report : * — 

"In numerous instances, no cross at all was made ; in many 
others, crosses were made in two or even all three columns, and, 
even when only one cross was made, there were often very 
strong reasons for believing that it has been made in the wrong 
column. Oftentimes this use of the wrong column can scarcely 
have been other than intentional ; being dictated by the foolish 
but very common desire of persons to magnify the importance 
of their occupational condition. This desire must have led 
many subordinates to return themselves as employers rather 
than as employed, for it is only on this supposition that we can 
account for the otherwise unintelligible fact that, under several 
headings, there are actually, according to the returns, more 
employers than employed, more masters than men. . . . We 
hold [these returns] to be excessively untrustworthy, and shall 
make no use whatsoever of them in our remarks." 

This attempt and its result are of the greatest importance to 
all who try to draw up forms of inquiry. 



♦ General Report on the Census ^/ 1891, p. 36 (C— 7222 of 1893). 



$2 ELEMENTS Ol'' STATISTICS. 

Before leaving the subject, it should be mentioned in passing 
that we cannot deduce directly from our census the number of 
persons dependent on a particular trade for their living ; that is 
to say, the number of employers, their families (not otherwise 
returned) and domestic servants, and the number of employes 
and their dependent families. This, the most important total 
for estimating the relative importance of different trades of the 
country, is not tabulated, though such tabulation has been found 
possible in other countries, and we are dependent on the esti- 
mates of statisticians for such totals.* 

To see how the information given by the answers on the 
census schedule can be worked up into detailed specific numbers, 
it is only necessary to look at the diagram and table prefixed 
to. each of the sections relating to special trades in Mr Booth's 
Lt/e and Labour of the People {e.g.y vol. v., p. 46). f 

♦See Booth in Statistical Journal^ vol. xiiJK, t See p. 7^^ infra. 



THE WAGE CENSUS. 33 



Section 2. — The Wage Census. 

The main differences between the wage census, taken in 
1886, and the general population census are — (i) That the 
filling up the forms in the wage census was voluntary ; 
(2) that their correct filling up required a higher degree of 
intelligence and education. As before, we must consider first 

the object which the wage census was intended 
to fulfil : it was to describe the earnings of the 
people of the United Kingdom, to compare the rates of wages 
trade by trade, and to find the relative numbers earning 
at each rate. What is the best quantity to measure with this 
object in view ? As a preliminary question should we take the 
The unit of day, week, or year as the unit of time? Clearly we 
^^•- shall not be able to compute weekly wages if we 
only obtain daily, for the week's work varies from four to seven 
days in different occupations. The week's wage is a more 
definite quantity ; but the simple comparison of weekly wages 
in different trades will be deceptive, because most trades are 
busier at one season of the year than at another, and in many 
the difference between season and season is very great ; in any 
particular week, then, we may be comparing the best season of 
one industry with the worst of another. To avoid this error, 
and because we do not know how many full weeks' wages are 
obtained in a year, except in a few non-intermittent trades, it 
would seem best to take the year as unit ; but the direct cal- 
culation of an individual's annual earnings is practically impos- 
sible. The employer is not acquainted with this sum, for in 
large establishments the hands are continually changing, and 
one man will be paid by two or more masters in the same 
year ; and even in a factory with a nearly constant personnel, 
the weekly amounts paid to individuals are not in general so 
tabulated as to be easily summed, and the working out of the 
totals would require a prohibitive amount of clerical labour. If 
we turn to the workman, on the other hand, we shall find in the 
majority of cases that no accurate account has been kept of 
earnings through the year, and it would only be by careful 
individual examination, impracticable on any large scale, that 

c 



34 ELEMENTS OF STATISTICS. 

an estimate could be made ; in many cases the men, even if 
willing, would be quite unable to give a connected account of 
their earnings during the past twelve months. 

It seems clear that we must adopt a smaller unit, and since 
most wages are paid weekly, a week is the most natural one. 
The subsidiary questions which will lead best to an estimate of 
annual earnings will be discussed below. The answer to the 
former question, as to the best quantity to investigate, is in- 
direct ; the only individual measurements we can obtain directly 
are the week's wages, but these may be supplemented by esti- 
mates en fnasse. 

Next, who possess the information we require? Clearly 
both employers and employed, and in an ideal census the 
Bm 10 en and ^"^wers would be obtained from both groups ; 
employed as but considerations of simplicity, cheapness, and 
infomiantB. accuracy are all in favour of applying to em- 
ployers alone. 

If employes were to be interrogated the procedure would be 
as follows. Draw up a form on the analogy of the census form, 
describe very briefly the purpose of inquiry, add a short series 
of concise, lucid, simple questions in suitable type and with 
careful spacing, such as will lead to the minimum information 
required ; let these forms be left to be called for, and when 
collected, let the tellers have time and opportunity to examine 
and correct them. It is clear that this method would entail an 
even more expensive organization than the population census, 
and as the result of experiment it may be doubted whether the 
maximum of accurate information that could be thus obtained 
would come up to the minimum that would be of use. A partial 
inquiry could, however, be carried out by means of trade 
unions if they were willing to give serious assistance. 

The method of inquiry among employers was as follows : — 
Suitable blank forms and an explanatory letter were sent by post 
to all employers, whose addresses could be found, in the industries 
selected for investigation, and the answers were returned to the 
central office by post. This is far simpler and cheaper than the 
suggested scheme for inquiry among workmen, requiring far fewer 
forms and only a small staff of clerks. With business men it is a 
simpler matter to post the return when completed than to keep it 
for collection by hand. Since there is no personal intercourse over 
the matter it is especially necessary that the questions should be 



THE WAGE CENSUS. 35 

lucid, for the additional correspondence necessary to rectify 
errors is a source of worry at both ends. A copy of one of these 
forms, abridged only in the number of subdivisions, is subjoined 
here and on the following page. 



WAGE CENSUS. 
Return of the Rates of Wages Paid in Silk Manufactures. 

Name of Factory or Firm 

Address 



Mote,—\\. is requested that the ssflaries of clerks and managers may be excluded. 

The return is of wages of working men only. 



Numbers employed on 1886 - - No._ 

Amount paid in Wages in the year 1885 - - j[^ 

Highest weekly amount paid in 1885 £y Date 

Number of Hands paid in that week - - No — 

Lowest weekly amount paid in 1885 £, Date. 

Number of Hands paid in that week - - No — 



State the present average rate of pay for overtime : that is, whether 
overtime is reckoned as time and a quarter or time and 
a half, &c., or in what way reckoned 

State whether overtime is at present being worked, and how much ; 
or whether less than full time, and how much less 



36 



ELEMENTS OF STATISTICS. 



Current Rates of Wages and Hours of Labour per 

Week of Persons employed in each Branch of the .Silk 
Manufactures, on 1886. 



Description of 
Occupation. 

N,B,— It is requested 
that this list of occu- 
pations may be re- 
vised where necessary 



Silk Throwing— 

( Time 
1 Piece 



Parters 

Winders 

Cleaners 

Spinners 

Doublers 



&c. 



I Time 
t Piece 
/ Time 
\ Piece 
/Time 
\ Piece 
/ Time 
t Piece 



Sorters 
Boilers 



Si/k Spinning — 

Openers and / Time 
\ Piece 
r Time 
\ Piece 

Dressers - j^™^ 

Preparers and j Time 
Gsurders - \ Piece 

&C. 

Si/k IFeavimg — 
Winders - |J?™^ 

Warpers , |^^ 

Warp Pickers { Time 
or Clearers 1 Piece 

DottUew . 1 J^^ 
\ Piece 






CuRRKNT Rates of Wages Paid and Number of 
Hours of Labour per Week when in full work, 
but exclusive of Overtime. 

A^<7/tf.— State the Number of Hours of Labour per Week, 
whether the Workers were paid by Time or Piece- 
work, and if paid by Piece-work give the amount 
earned in a week, exclusive of Overtime. 



MALES. 



Mbn. 



a 9. 



o ,/5 

<2& 



Lads & Bovs. 



o.: 

S9 
i 
O ti 









O (4 



females. 



Women. 

18 years aad 
upwards. 



w 



o A 

to «> 



U) 



9 

9i 
O c« 



» 



Girls. 
Under x8 years. 



e.2 

9 a 
W 



*2^ 






THE WAGE CENSUS. 37 

The measurement of the annual earnings of groups of 
workpeople was the ultimate object of the inquiry. Annual 

earniners are composed of many different items, 
of which the following are the most important : — 
Ordinary weekly wages, pay for overtime, special payment 
for special work (e,g.^ of builders if sent to a distance), or at 
special seasons (such as the harvest) ; and payments not in 
cash, such as free or reduced house-rent, free or cheap coal, 
and special goods at cheap or wholesale prices (such as cloth in 
textile factories, or potatoes for agricultural labourers). 

When payment in kind is at all general or important, it is 
generally better to proceed on a different method entirely, e,g,, 
that followed by the Agricultural Sub-Commissioners of the 
Labour Commission. When it consists of only one simple item, 
such as a house rent-free, it can form the subject of an additional 
question on a form similar to that on p. 35. In the silk industry 
this does not occur ; but this discussion shows the necessity of 
preliminary knowledge on the part of the investigator before 
the right form of inquiry can be drawn up. 

We have left for consideration the weekly wage, and over- 
time and special payments, the last two of which can be grouped 
together. The ordinary weekly wage is a sufficiently general 
and definable quantity in most subdivisions of most industries 
A foreman could generally state how much is earned in an 
ordinary full week for each of the hands under him. In many 
cases there is an hourly or weekly sum regulated by a trade union, 
as in the building trades. In others, as in the cotton industry, 
piece-rates are so regulated as to bring out a definite sum 
for the week's work graduated in relation to the difficulty of the 
task ; in general, a very rapid survey of the wage-book will show 
what the worker in each subdivision will make on an average. 
Thus the average weekly wage in an ordinary full week can be 
found with considerable accuracy, but this takes us only part of 
the way in the calculation of annual earnings ; we need to know 
in addition to this how many full weeks are made in the year. 
It is the method by which this is attempted on the printed form 
that is open to most criticism. The questions used are on 
P^ig^ 3S> 2ind afford a good example of the general difference 
between the quaesita and the data which are attainable. The 
quaesitum is : To how many full weeks' wage are the annual 
earnings equivalent allowing for slack weeks and overtime? 



38 ELEMENTS OF STATISTICS. 

The first crucial question to decide is : Are we to allow for an 
average loss of time, say a week in the year, through sickness, or 

are we to allow only for time lost through failure of 
^^"tota."* work? Since sickness is an individual not a general 

misfortune, it will be better to exclude it if possible. 
Now overtime in one season, especially if its wages are on "time- 
and-a-quarter " or "time-and-a-half" basis, very quickly tends 
to balance slack time at another season, though it may be sup- 
posed that it is rarely the case that more than the normal week's 
wage is averaged through the year. Thus it will be logical as 
well as simple to estimate the year's earnings as so many normal 
weeks' wages. For example, if we found that two weeks were 
lost through sickness and three through the mill stopping, and 
that overtime in one busy month had added wages equivalent to 
two normal weeks, we should have forty-nine weeks' full wage. 
The figures which will give this result will be the total sum paid 
in wages in the factory in the year divided by the aggregate normal 
week's wage of the people dependent on the factory, supposed 
all at work. Thus, if 1,200 hands (men, women, and children) 
would, if all at work, make ;£^ 1,000 in a normal week, and this 
was the average number dependent on the particular mill, and if 
;^48,ooo was paid in the year in wages, annual earnings would 
be equivalent to forty-eight normal weeks, and earnings would 
average ;^40. Now the total paid in wages is generally kept 
separate in business accounts, but the number dependent on the 
mill for work is often not known accurately ; for the personnel 
of a large establishment is subject to continual change, and the 
manager would not know whether a person who left went to 
another mill or got no work. The total number of all who had 
worked there during the year would be too great for this purpose, 
and the number at work in a normal week too small. The 
number open, perhaps, to least objection is the number at work 
in the busiest week of the year ; for those absent except through 
sickness when trade is busy cannot be said to be dependent on 
the factory, but if not at work elsewhere are among the per- 
manent unemployed ; very few workpeople indeed will be 
taking their holiday at a busy time, and it may reasonably be 
supposed that all the factories in the same industry will have 
their busy and slack seasons at nearly the same time. The 
answers then to the printed questions — Total paid in year, and 
number of hands in busiest week — tell us all we need to know, 



THE WAGE CENSUS. 39 

if we may make this assumption ; for then the total sum paid as 
wages in the year, divided by the maximum number employed 
in the busiest week, gives the average annual earnings. To find 
the equivalent number of normal weeks, multiply the maximum 
number employed by the average wage found on the second page 
of the form, so that the product shows the aggregate weekly 
wage if all were employed, and divide the total paid in the year 
by this product. 

In the Cotton industry the sum of the greatest numbers 

employed (if these may be taken as equivalent to the 

numbers employed in those weeks when the wage bill 

Lost timQ in the was highest) was, in 1885, 87,887. ;^3, 148,566 

cotton induBtry. was paid in wages in that year in the factories 

making returns. Average annual earnings were therefore 

<r" QQ~ ^ = ^35' '6s. The average wage in a normal week in 

o7}Oo'^ 

1886 was 15s. 2jd ; the product of this and 87,887 is ;£'66,830. 

The equivalent number of normal weeks' work is \^* = 47. 

66,830 

Hence we may conclude that, if our basis of calculation is correct, 

five weeks was the average lost time at that date. 

This is not the method adopted in the General Report of the 
Wage Census ; there the total paid in 1885 is divided by the 
number employed in a given week in 1886. This number is 
certainly too small, less than the number dependent on the 
trade, and as might be expected gives on analysis absurd 
results in some cases. It is to be noticed that the method here 
described cannot be employed in those few industries which the 
employes are able to leave in the slack season in ordet to earn 
wages in other trades which may then be exceptionally busy. 

Since there is no reason why the number absent through 
sickness should differ in the busiest week from the average 
number so absent, it is clear that the estimate we obtain for 
average lost time (five weeks in the wool industry) is in addition 
to the average time lost through sickness ; this may often be 
estimated from the returns of friendly societies. 

In the corresponding French wage census, of which the 
results were published in 1898,* an estimate of the number of 
days' work obtained in the year is formed on a different basis. 

* Salaires et Durk$ du Travail^ 1897, pp. 15, 16. 



40 ELEMENTS OF STATISTICS. 

The data collected were — (i) The variation each month of the 

personnel in each industry, which is found to average 4 per cent. 

TiiePrenoh for the year — that is, for each 100 employed, g6 

method. j^re found who have been in the same establish- 
ment for as much as twelve months : (2) The differences between 
the maximum and minimum numbers employed in each estab- 
lishment month by month during the course of a year, which 
are found to average 19 per cent, of the (? average) personnel. 
From this we may perhaps draw the conclusion that, on an 
average, half this number, at least, are in general out of work : 
(3) The number of different persons who have been employed in 
each establishment at one time or other in the year ; this is 
found to be 140 for each 100 permanently employed, from which 
the legitimate conclusion is that the average number of unem- 
ployed is not so much as 40 in 140, i.e., 28 per cent. These two 
percentages, 9 per cent, and 28 per cent., are taken to be the 
inferior and superior limits of average lack of work. This in- 
formation is more detailed and perhaps more reliable than that 
on which the method, used above for the English figures, is 
based. Data obtained from syndicates of French workmen 
indicate about 20 per cent, as the average want of work ; the 
English figures obtained by the method described above from 
the whole wage census yield about 12 per cent. 

This somewhat lengthy discussion on the few questions 
included on the first page of the form is a good illustration of 
the necessity of considerable preliminary study before a blank 
form can properly be drawn up. Space does not allow a 
detailed criticism of the rest of the form. 



THE WORK OF THE LABOUR DEPARTMENT. 4I 



Section 3. — The Work of the Labour Department. 

The Labour Department of the Board of Trade was founded in 
1893 ; its functions are to collect and publish information, chiefly 
The Labour Statistical, relating to the economic conditions of 
Department, workpeople, and the state of the market for labour. 
Its work lay almost entirely in virgin ground ; new sources of 
information had to be tapped, new methods developed. While 
it was untrammelled by tradition, it could avail itself of the ex- 
perience of the Board of Trade, and was already in touch with 
a widely extended organization. Under these circumstances 
it was soon able to attract a comprehensive and continuous 
supply of valuable information ; and the methods by which it 
accumulates and compiles its statistics should be interesting and 
instructive to all those whose business it is to work in any 
statistical field. 

The figures which are received periodically by the Depart- 
ment are published monthly in the Labour Gazette, Here the 

first article each month is on the " State of Employ- 
ment." As before we must first consider the 
question. What are the exact objects of the investigation 
of which the results are here published? They are to find 
out how many persons are out of work in each trade and 
district, what percentage they form of all dependent on each 
industry, and how this percentage changes month by month 
and year by year. The next question is, How much of this 
can be discovered, and, if we cannot measure these numbers 
directly, what are the best allied quantities to measure? 
Since no universal register is kept of the unemployed, it would 

Possible seem easier to estimate the number employed, since 
measurements. ^^ employer Can generally state how many work- 
people he has at work at any given time. If we cannot discover 
the number of men at work, we may perhaps be able to find the 
number of machines, furnaces, mines, &c., at work, and deduce 
the number of men employed with them, and thence the number 
of unemployed ; or we may find for how many hours work was 
carried on in a factory or a mine ; or we may even go a step 
further back and find the amount of goods produced, and thence 



X 



42 ELEMENTS OF STATISTICS. 

estimate the other quantities ; or we may learn the total amount 
paid in wages. These are the numerical methods ; but there are 
others, useful if not so exact. We can obtain reports as to the 
condition of employment in the various districts or industries, 
not in numbers, as is generally necessary, but with descriptive 
adjectives, — such as busy, slack, improving, much the same, — 
which may lead to numerical estimates, or may serve to check 
results. Lastly, organizations for facilitating employment may 
send in returns of the applications made to them. Nearly all 
these methods are in use at the Labour Department. 

Next, who possesses the necessary information ? As regards 
the number unemployed, the only registers kept are those of trade 

unions, to whose secretaries inquiries should be 
addressed. The figures so obtained will naturally 
only relate to those sections of an industry where trade unionism 
exists. As regards the number employed, the masters are the 
authorities, and forms must be sent to them asking the numbers 
at work day by day, or at longer intervals. With respect to the 
number of machines at work, the number of shifts, and the 
total wages paid, the masters again have the information. For 
the amount produced, the masters, or in some cases officials to 
whom they make returns, can supply the facts. For general 
information as to the state of employment, some presumably 
competent person, in touch with all the factories in an industry, 
or all the trades of a district, must be impressed to forward 
periodical reports. The Labour Department is in touch with a 
great number of such correspondents, many of them connected 
with the trades councils of their towns. 

The question as to whether the information will be given 
impartially and willingly need not detain us long in this case ; 
for, generally speaking, the returns are simply automatic copies 
or registers of known numbers, and would only be partial if 
wilfully falsified ; and since the returns are made periodically, 
the persons concerned regard them as a matter of course, and, 
once they have commenced, continue willingly to forward the 
requisite figures. 

By the courtesy of the Labour Department I have been able 
to obtain copies of most of the forms in use. There are some 
forty in all, each suited to some special industry or method of 
investigation. 

It must be remembered that the Labour Department had to j/ 



THE WORK OF THE LABOUR DEPARTMENT. 43 

form its own intelligence organization, and initially was obliged 

The formation ^° apply to persons able to give information, just 

ofaninteuigenoe as any private investigator would. A connection 

dopartment. j^^^ therefore to be established with trade unions 

and other societies, and with manufacturers ; and, when a nucleus 

had been formed, continual efforts were necessary to extend the 

organization in all directions. One or two of the circular letters 

written for this purpose are given on the following pages, since 

they are typical of the method which investigators must employ 

to enlist the help of possible informants who are uninterested. 

The points to notice are: — (i) The statement of the exact 

purpose for which the information is wanted ; (2) the simple 

and explicit direction as to what is to be done by the 

informant ; (3) the undertaking that the information will not 

be used in any way that can do, or appear to do, him injury. 

Here is one of the earlier letters, opening a connection : — 

Labour Department, 1894. 

Dear Sir, — The Labour Department of the Board of Trade, which 
is charged with the duty of collecting periodical statistics as to the 
condition of the Labour Market, is desirous of obtaining fuller informa- 
tion from month to month with regard to the state of employment in the 
Pig Iron Industry. For this purpose, the Department would be glad 
to receive monthly information from a large number of the employers 
in the United Kingdom as to the number of furnaces in blast and the 
numbers of workpeople employed, on the average, at each furnace. 

I shall accordingly be glad if you will be kind enough to assist the 
Department in making this inquiry complete by filling up and returning 
to me before the 4th of May the enclosed form. Postage need not be 
prepaid if the reply is addressed to " The Commissioner for Labour " 
at the address given above. 

The results of the inquiry will not he published in such a form as to 
render possible the identification of particular returns. — Yours, &c. 

When the Department had organized its work, and tabulated 
and published some of its returns, the next step was to endeavour 
to achieve completeness. When many are known to have given 
information, the more cautious will be encouraged, the less ener- 
getic be ashamed to be less public-spirited than their neighbours, 
and the critical anxious to correct mistakes. The first of the 
following letters, which is used for general purposes, takes ad- 



44 ELEMENTS OF STATISTICS. 

vantage of these tendencies, and the second is another excellent 
example of the method of extending the organization : — 



1895. 

Dear Sip, — I am forwarding herewith a copy of the "labour Gazette" 
for the current month, and beg leave to draw your attention to the article 

therein dealing with the state of employment in the 

The Labour Department is very desirous of making the information 
contained in these monthly reports as complete as possible, and trusts 
you will kindly assist by filling up and returning the enclosed form. 
You will notice that the form is of a very simple-kind, and one that can 
readily be filled up without much trouble. 

/ viay add that Returns are regarded as strictly confidential and are 
only used to produce general statistical results in ivhich the identity of 
individual returns is lost. — I am, &c. 

March 1895. 

Dear Sir, — This Department has for some time past received 
monthly Returns, both from the Dock Companies, and the Ship-owners 
who do their own unloading work in the port of London, with regard 
to the number of Dock Labourers employed. These Returns are 
collected with a view to throwing light on the periodical fluctuations in 
the employment of this class of labour ; but the figures are published in 
a general total and not in such a way as to make possible the identifi- 
cation of particular firms supplying the information. The article on 
page 36 of the enclosed Labour Gazette will show you the use made 
of the Returns. 

Hitherto no exact information has been obtained with regard to 
employment of labour at the wharves, and you will readily see that the 
addition of such information would very greatly increase the value of 
the statistics. The managers of several of the most important wharves 
on both sides of the river have been good enough to promise to make 
monthly Returns ; and I should be greatly obliged if you could see 
your way to assist the Department by supplying the information speci- 
fied on the enclosed form, not later than the date there indicated. 

You will observe that a form is provided for the daily number of 
labourers employed, and, alternatively, for the average weekly number. 
The daily number would, on the whole, be the most useful for the pur- 
poses of the Department ; but if for any reason you cannot see your 
way to supply such detailed information, a weekly average would be of 
value. — I am, &c. 



THE WORK OF THE LABOUR DEPARTMENT. 4$ 

Another letter may be given as serving to encourage those 
who have already engaged in the good work. It will be noticed 
that, though now more concise, it is still insinuating. 



Agricultural Labour in January, 

Dear Sir, — I am instructed by the Commissioner for Labour to ask 
you to be good enough to favour this Department with replies to the 
questions on pages 2-4 of this form, by Friday, 4th February 1898. 

I beg at the same time to thank you for your kindness in send- 
ing answers to questions put to you by the Department on former 
occasions. — Yours, &c. 

These letters are well worth noticing because they have 
assisted to build up a very efficient organization for information 
out of nothing, and have succeeded in eliciting answers from 
uninterested men of business, who are not given to spending 
time and trouble on unremunerative labour. 

Please forward this Return to the address on the back not later than the 
Fifth of the month succeeding that to which it relates. No postage 
need be paid. 

Return of State of Employment 

tn Month of. . 189 

Name of Society 

Total number of members in Society at close of month : 

Number receiving out-of-work pay in last week of month 

(Do not include members on strike or locked out.) 

State, if possible, number of members entirely unemployed but not 
receiving benefit in last week of month 

State of employment for month ^ 

If any dispute, change in wages or hours of labour has occurred, please 
say, and the necessary forms will be forwarded at once 

Remarks 



Signed. 



Secretary, 
Date 1 89 



46 ELEMENTS OF STATISTICS. 

The form given above is that issued to trade-union secre- 
taries, and it is by its means that the only perfectly definite 

measure of want of employment is obtained. It 
^""' should be remembered that it is filled in monthly 
by the same official, and requires* no special explanation. Since 
the first attempt to draw up such a form is apt to present 
difficulties, this may be noticed in detail. First, we find an in- 
struction as to the way it should be returned. Most forms are 
provided with a printed envelope to save trouble and mistakes, 
and postage is paid by the investigator. Next comes a brief 

Examination of ^^^^^^S ^"^ ^^^ date, and then the name of the 
unemployment society, an item used for reference and further 
*^*^''* inquiry, but not for publication. Now we need 
to know chiefly the percentage unemployed, but secondarily 
the total number. The questions most easily answered should 
be asked, and the calculations done at the central office, for the 
trade-union secretary may make arithmetical mistakes. Again, it 
is not the numbers day by day that are asked, for they are hardly 
known ; nor week by week, which would give trouble ; nor even 
the average for the month, for that might lead to guess-work ; 
but a definite day or week is decided on, the same for all trades. 
For purposes of comparison trade with trade, or month with 
month, this is found sufficient. 

We notice next an important point connected with the defini- 
tion of Unemployment, and also an illustration of the necessity of 
Definition of Studying the figures at their source. It is not 
unemployment, stated explicitly in each Gazette whether men on 
strike or locked out are included as " unemployed " or not. A 
reference to this form shows that they are not included, and, 
therefore, before conclusions are drawn as to the amount of 
work obtained year by year, the excellent statistics relating to 
labour disputes given in another part of the Gazette must be 
studied. 

All members of trade unions out of work do not at once 
receive " benefit," the technical term for any payment of union 
funds, and therefore a correction must be made in some cases 
for those who have not yet come " on the society." The 
number is likely to be known accurately to the local secretary ; 
but if the form had to be filled in at a London office, say, for 
the whole Amalgamated Society of Engineers, the number 
would not there be known. At this point we are left in some 



THE WORK OF THE LABOUR DEPARTMENT. 47 

doubt as to the methods of the Department ; for we are not told 
whether these forms are sent to all local secretaries, or whether 
they are filled up centrally for whole districts, or what additional 
information is obtained from other sources. 

The next line is for a qualifying adjective which will serve 
to check labour correspondents* information, and to indicate 
whether the last week was typical of the month. 

When an organisation is ready for a special purpose, any 
secondary use may be made of it that will not vitiate its chief 

Subsidiary end. The Labour Department is always anxious 

Information, ^q hg^r of all changes of wages, and in general has 

to detect their existence for itself, hence no opportunity of 

obtaining such information is ost, and this widely circulating 

form is used for the purpose. 

Lastly, if the informant is an intelligent man and acquainted 
with the methods of the Department, it is well to give him an 
opportunity of adding any relevant remarks that may occur to 
him. On the line " Remarks " might be given some reason for 
any exceptional numbers or information as to trade prospects 
which might furnish a clue to the Department for other investi- 
gations. The paper is signed and dated to show that it has 
been filled in officially and at the right time. 

These forms are not always filled in and returned to date ; 
sometimes special application has to be made for them,. and 
occasionally the necessary numbers have to be interpolated from 
other sources. 

The next form given is more complicated, and illustrates 
two of the methods mentioned, finding the number of persons 
employed and the number of days worked. 



I 



48 



ELEMENTS OF STATISTICS. 



Please fold and return this form by the 29th December to the address 

given on back, 

EMPLOYMENT AT COAL MINES. 

Individual Returns are regarded as confidential and not published 

sepaiately. 

County in which Pits are situated 

Name of Firm or Company 



Postal Address to which form should be sent 



Namber of shifts usually worked in each 24 hours- 



Names of Pits or Seams. 



State whether 
bulk of Coal 
raised was 
" House," 
"Steam" 
"Gas/^ 
"Manufac- 
turing," or 
"Coking" 
Coal. 



9. 



* Number of 
Wor]n>eople 
paid on last pay- 
day in four 
weeks ending 



25th 
Dec. 
1897. 



No. 



26th 
Dec. 

1896. 



No. 



t Number Of 
Days 

on which 

Coal was 

hewn and 

wound at the 

Colliery in 

four weeks 

ending 

December 25, 

1897. 



Days. 



: Short Time. 

If any of 

the days stated 

in Col. 5 were 

shorter than 

usual, please 

state in Col. 6 

I he total amount 
of time to be 
deducted by 

Labour Dept. in 
four weeks. 

6. 



Days. 



Hours. 



No. of " Other Workpeople " 
general to all or several 
Pits and not included 
above, and Number of 
Days worked by them in 
four weeks . . - 



* The number df Workpeople should include all Men and Boys, &c., 
employed in and about the Pits, except Clerks and Managers. The number 
should also include " Drawers '* and others who may be paid by " Hewers." 

+ The number of DaySy whether full or not, on which Coal was "hewn and 
wound" should be inserted in this column. If on any of these days short 
shifts only were worked, the extent of the time lost should be stated in 
Column 6 ; but it should be left to the Labour Department to deduct from 
the figures in Column 5 the Short Time, if any, given in Column 6. If the 
time worked on Saturday is usually shorter than on other days, no reduction 
should be made on that account. 

X Short Time, — Please state here any special reasons for Short Time : — 



Note. — A General Summary of the Returns received will appear on isth January in 
the Labour Gazette^ which can be ordered through any Newsngent, price id. 



THE WORK OF THE LABOUR DEPARTMENT. 49 

The form has one or two peculiarities. A colliery company 
has often several pits, so that in the first place it is not obvious 
PeoBiiaritiea at which address this information can be most 
offonn. readily given, while it is important not to waste 
time and trouble in forwarding from office to office ; and, in the 
second place, not only will work be done for different lengths of 
time in different pits, but also there will be variation from seam 
to seam in the same pit This was not recognised in the first 
form sent out, but a second had to be sent distinguishing the 
pits and asking for subsidiary information. In this form may 
be seen the modifications that must be introduced to suit the 
questions to particular industries. In a colliery the factor which 
determines the state of employment is generally not the number 
employed, but the number of days* work, the number of days 
" coal is wound." A colliery at full work may make four, five, 
or six days a week, or eleven days a fortnight (leaving one day 
a fortnight for repairs), according to the custom of the district 
and the state of the trade, and there may be two shifts or three 
in the twenty-four hours. If work is slack, the number of shifts 
per fortnight, which is really the essential quantity to know, will 
be diminished, and the alteration will very likely affect all em- 
ployes equally. Again, since the colliers are not all at work at 
once, the question is not " how many are at work ? " but " how 
many are paid?'* the pay-day, once a fortnight, or however 
often it may be, being perhaps the only time when all the 
workers are together. The number at pay-day is, therefore, the 
number employed in the mine, a quantity varying as new seams 
are opened or old seams worked out, and the number of days 
on which there is work is the factor which determines the 
amount of work obtained per workman. Notice that the ques- 
tions, number of shifts per day, number at pay-day, and days at 
work, are precisely those which the manager will find easiest to 
answer. Since, however, days are of different lengths, depending 
on the demand for coal, the good working of machinery, the 
presence of the necessary trucks, and the efficacy of the railway 
arrangements for clearing the yards, and other circumstances, it 
is necessary to know whether on any working days, winding 
stopped early or the shift was shortened ; hence the question in 
column 6, which will give the manager more trouble. 

In the form relating to dock labour, the question is simply, 
how many are employed, not at the end of the month, but 

D 



50 ELEMENTS OF STATISTICS. 

day by day ; for labour at the dock fluctuates violently and 
continually, as may be seen from the monthly diagram in the 

Forms for other Labour Gazette, On that relating to the Surrey 
industries. Commercial Dock, the question is again modified 
to suit, it may be supposed, a special method of bookkeeping, and 
reads, " What is the amount of wages paid at the end of each 
week?" Wages are perhaps a better measure of dock labour 
than number employed since the number of hours worked varies 
continually, men being taken on for long or short hours, but the 
rate of pay varies little. On both forms there is a question 
as to any special holidays or other events affecting work. On 
the form sent to pig-iron works, the question asked is as to how 
many furnaces are "in blast," or have been "blown out," or 
re-lit; on that relating to steel, iron, and tinplate works, the 
information required is " the number of shifts " worked in four 
weeks. Another form is to be filled up by a single correspon- 
dent for a wide district, and the returns are entered under the 
headings — Number of mills (i) running full time and giving full 
employment; (2) running full time, but giving only partial 
employment ; (3) running short time ; (4) stopped. 

Another instance of adaptation of the form to a particular 
industry is afforded by the inquiries as to agriculture. In this 

Employment m case the number of employers is very great, they 
agriouitnre. ^^^ ygj.y much Scattered, and little used to statisti- 
cal inquiries, and the labourers are for the most part uncombined. 
On the other hand, in the majority of villages agriculture is the 
predominant industry, every one knows all about every one else 
and any one intelligent person can give an accurate account o 
the state of labour in his district. It is necessary then to arrange 
with a labour correspondent, a farmer, or a member of the Village 
Council, or the chairman of the District Council, and to apply to 
him monthly for information. 

Only one general organization is necessary for the collection 
of the three groups of figures wanted by the Department for all 
industries. These groups relate to the state of employment, which 
fluctuates continually ; changes of wages, which in some cases 
take place at stated times, in others occur irregularly; and strikes, 
which may begin at any time and last a long while. In the 
case of agriculture, the three groups of questions are placed on 
a single form, though the practice has changed a good deal since 

1893- 



THE WORK OF THE LADOUR DEPARTMENT. 51 

One form, that in use in 1894, asks for complete details as 
to wages and the number employed at the harvest, with a page 
devoted to strikes, and two spaces for remarks on the weather 
and on things in general. 

The next, July 1895, deals with haymaking, strikes, and 
wages. The questions here are as follows : — 



1. Were there any able-bodied agricultural labourers in irregular work 

in your Parish during the month of June? 

2. If you answer question i in the affirmative, can you give the numbers 

and state about what proportion those in irregular work were of the 
total number of able-bodied agricultural labourers ? 

3. If you can give the particulars asked for in questions i and 2 for any 

neighbouring Parishes, kindly do so. 

4. What daily or weekly wages are being paid in the district to the 

regular farm hands during haymaking ? Also state how much is 
paid for overtime and what perquisites are given, such as food, 
beer, &c. 

5. What daily or weekly wages are being paid in the district to extra 

hands during haymaking ? Also state how much is paid for over- 
time and what perquisites are given, such as food.. 

6. Were there any agricultural strikes in your neighbourhood during 

June ? If so, please give the following particulars with reference 
to each strike : — 

(i) The date ; (2) The cause ; (3) The duration ; (4) The 
result ; (5) The number of men affected. 

There are differences in the forms for nearly every month in 
the year ; and the questions have been modified as experience 
suggested till they are finally as follows : — 



52 

County. 



ELEMENTS OF STATISTICS. 



Union 



Parish. 



I.— STATE OF EMPLOYMENT. 

r. Approximate number of able-bodied agricultural labourers in 

Parish. 
(If this question has been recently answered by you, you need not repeat your reply. ) 

2. Were there any able-bodied agricultural labourers in irregular work 

in your Parish during the month of January 1898? 

3. If so, can you say about how many were in irregular work in the 

last week of January 1898 ? 

4. Was employment more regular in January 1898 than in January. 

1897? 
If you can give the above particulars for any neighbouring Parishes, or 
for the whole of the Poor Law Union, kindly do so. 



II.— CHANGES IN RATE OF WAGES IN 

JANUARY 1898. 

Ctianges in Weekly Cash Rates of Wages of Ordinary 

Labourers in January 1898. 

(N.B, — Ordinary labourers do not include foremen, shepherds, cattle- 
men, carters, waggoners, teamsters.) 



Locality in which Change took place. 
(State whether Change applies to 
the whole County, or to which 
Poo- Law Unions or Parishes 
within it ) 


Approximate Num- 
ber of Labourers 
who have had a 
Rise or Fall in 
Wages in Janu- 
ary 1898. 


Cash Rates of 
Wages per Week. 


Please state in this 
column for com- 
parison what the 
Rate of Wages 
was in January 
1897. 


Before 
Change. 


After 
Change. 


* 




s. d. 


,f. d. 





Name of Correspondent, 
Postal Address. 



THE WORK OF THE LABOUR DEPARTMENT. 53 

For convenience of printing the exact spacing allotted for 
answers has not been introduced in these reprints. In the 
agricultural forms a great many square inches are allowed for 
such an answer as " Yes " ; in the others the space is allotted in 
proportion to the amount of information expected. The ques- 
tions as to wages on this form will be alluded to presently. 

The information collected by the Department as to trade 
disputes is detailed and important. The principal questions 

in the investigation relate to the causes of dis- 
putes, the methods of settling them, and the total 
loss of money to workpeople and employers. Of these the 
first two are not statistical questions, but are inserted because 
the inquiry has three objects: — (i) A general examination 
of the causes of and remedies for strikes ; (2) an inquiry as 
to the course of each particular dispute, so as to bring the 
Conciliation Act into operation if possible, or by disseminating 
information to assist an arrangement or compromise ; (3) the 
collection of statistical information. 

The Department is dependent on its own alertness for 
knowledge of the existence of disputes, and its chief sources 
of information are the daily press (London, Provincial, and 
Trade) and special local correspondents, who are expected to 
inform the Department, directly work is stopped owing to a 
dispute, on a special form. 

As to the question, Who knows the facts? obviously the 
only people are the employers and employed ; and since they 
may take different views on all subjects connected with the 
dispute, both parties must be addressed. In this investigation 
partiality and bias in the answers will be at a maximum ; the 
questions must be restricted as far as possible to facts about 
which two opinions are nearly impossible, and any questions 
which will not be answered willingly should be omitted. 

On pages 54, 55 is given the form sent to Trade Unions in 
1895, on pages 56, 57 that used in 1897, and on page 58 the 
letter accompanying the latter. 



54 



ELEMENTS OF STATISTICS. 



LABOUR STATISTICS.— 

QUESTIONS AS TO 



Questions. 



[. Name of employer, firm or company, and trade - 

[Where more firms are involved than one, or the strike 
or lock-out has been general over a locality, the number 
of employers or firms to be stated as nearly as possible.] 

2. Cause or object of strike or lock-out - 

3. Whether strike was ordered or approved by trade 

union 

4. Date of commencement and termination of strike 

or lock-out - - - - - - 

5. Result of strike or lock-out - . - - 

[If dispute has been respecting increase or reduction of 
wages or hours of work, state exact amount of increase or 
reduction (if any).] 

6. Mode of settlement 

7. Number of persons affected .... 

(i) Number directly on strike or locked out 

a. At beginning of dispute 

b. At end of dispute - - - - 

[Distinguish between adult men and women, and 
apprentices or other young persons.] 

(2) Number employed in factories or works 
where strike or lock-out occurred, and who 
were thrown out of work thereby, but were 
not directly on strike or locked out - 

8. Estimated total amount of wages earned in a full 

week (exclusive of overtime) by those affected 
immediately before and after strike or lock-out 

a. Directly affected - - - - 

b. Indirectly affected - - - - 

9. Number of those on strike or locked out who 

belong to trade unions 

10. Amount expended in support of persons on 

strike or locked out 

a. By union 

b. By other strike fund 

1 1. Number of persons who " went in ** or returned 

to work before termination of the dispute 

12. Please suggest means of settling or preventing 

labour disputes 



Answers. 



I. 



2. 



4. Date of commence 
ment. 

5- 



6. 

7. 



8. Before Strike 
Lock-out. 

a. 
b. 



or 



10. Amount per head 
per week. 



II. 



12. 



THE WORK OF TH£ LABOUR DEPARTMENT. 



55 



STRIKES AND LOCK-OUTS. 

Strikes and Lock-outs. 



After Strike or Lock-out. 



a. 
b. 



Total amount expended. 



lO. 



II. 



12. 



1895, 



Answers. 


General Observations. 




I. 






2. 






3- 




Date of termination. 


4. 
5- 

6. 
7. 





S6 



ELEMENtS OF StATIStlCS. 



information for the use of the Labour Department^ Board of Trade^ 44 Parliament St., S, W. 

STRIKES AND LOCK-OUTS. 

Part I. 
[To be forwarded as soon as possible, without waiting for settlement of dispute.] 



Questions. 


Answers. 


1. Name of Trade affected - 

2. Number of Firms involved 

[If an Employers' Association 
is concerned in the dispute, 
please give the name and ad- 
dress of its Secretary. 

If there is no such Association, 
please give the names and ad- 
dres«?es of the principal firms 
involved in the dispute.] 

3. Cause or object of strike or 

lock-out .... 
(Enclose copy of any application 
or Notice connected with the 
origin of the dispute. ) 

4. Date of the first day on which 

the workpeople were absent 
from work through strike or 
lock-out. 
(If notices were handed in, give 
also date of notice. ) 


« 




Occupations. 


Men. 


Women. 


Apprentices or other 
Young Persons. 


5. State occupations and number 
of workpeople (Unionists and 
Non - unionists) directly on 
strike or locked out. 

5^. State occupations and number 
of other workpeople (Unionists 
and Non-unionists) employed 
in above establishments who 
were thrown out of work owing 
to the strike or lock-out. 


1 

1 








Total Number of workpeople 
affected* - 











* If any other workpeople were affected, respecting whom you can state no exact figures, 
please give, if possible, the name and address of some person who could do so : — 



THE WORK OF THE LABOUR DEPARTMENT. 



57 



Information for the me of the Labour Department^ Board of Trade, 44 Parliament St., S. W. 

STRIKES AND LOCK-OUTS. 
Part IL 
[To be forwarded as soon as the dispute is terminated.] 1897. 



Questions. 


Answers. 


6. Date of termination of strike or 

lock-out, f.^., the last week-day 
on which the workpeople were 
on strike or locked-out, or the 
date when the places of the 
strikers were filled up. 
(If there was no definite end to the 
dispute, please state approximately 
when it may be regarded as practi- 
cally closed. 

7. Result of Strike or lock-out 
{Enclose copy of any printed or 

written agreement that may 
have been made, ) 

8. Describe the steps taken which 

resulted in the settlement, 
giving the names of any or- 
ganizations or persons who 
assisted in bringing this about. 


• 



If the result involved a Change in the Rate of Wages or Hours of Labour, give 
the following particulars for all workpeople whose wages or hours were changed, 
whether Strikers or not : — 



Occupations 

affected by Changes 

in Wages or 


Number of 

Workpeople 

whose Wages or 

Hours were 

changed.* 


Date from 

which Change 

takes effect. 


Bate of Wages t 

in a Full Week, 

exclusive of overtime. 


Hours Of Labour 

in a Full Week^ 

excIuMve of meal times 

and overtime. 


Hours. 


Before 
Change. 


After 
Change. 


Before 
Change. 


After 
Change. 

















* I'his is not necessarily the number on Strike or Locked out. 

f When there has been a change in piece rates, please |^ve the percentage increase or decrease in piece 

prices, and approximately the average earnings in a full week (exclusive of overtime) before and 

after change. 

Signature 



Address. 



Date. 



58 ELEMENTS OF STATISTICS. 



Labour Department, Board op Trade, 
44 Parliament Street, London, S.W., 1897. 

Dear Sir, — The Labour Department of the Board of Trade is 
desirous of obtaining a complete and accurate record of Strikes and 
Lock-outs, and Changes in Rates of Wages and Hours of Labour in 
the United Kingdom as they occur, for publication in the Annual 
Reports presented to Parliament, and also in the " Labour Gazette," 
which is issued monthly. 

These statistics are collected and published by the Department in 
pursuance of the following Resolution adopted by the House of 
Commons on the 2nd March, 1886 ;— 

''That in the opinion of this House immediate steps should be taken to ensure 
in this country the full and accurate collection and publication of Labour 
Statistics." 

As the value of these statistics is greatly increased if the parties 
concerned co-operate with the Department by supplying accurate 
information, I should be glad if you would kindly answer as many as 
possible of the question? asked on the inner pages of this form so far 

as they relate to the 



If from any cause you are unable at present to answer the questions 
on Part IL of the form, will you be so good as to fill in and return 
Part I. at once, and send Part H. as soon as it is possible to do so. 

I have to add that any information you may be good enough to 
furnish will be used solely for statistical purposes, and will not be 
published under your name. 

A circular asking for similar particulars is addressed to the employer 
affected by this dispute. — Yours faithfully. 



Chief Labour Correspondent. 



THE WORK OF THE LABOUR DEPARTMENT. 59 

The letter given with the later of these forms is a particularly 
careful one, showing the object of the inquiry, promising secrecy, 
and guaranteeing an impartial survey by the statement that 
similar forms are sent to employers and workmen. The 
forms addressed to employers are precisely similar in general 
appearance. 

The main difference between the forms is that the later is 
divided into two parts, the first of which can be filled up directly 

ohaage of work is Stopped by a dispute, so as to give the 
form. Department a clue as to its magnitude and cause. 
The second part is detachable, and is to be preserved till the 
dispute is ended, and then forwarded. The advantages of this 
method are that the Department has early information as to 
the exact facts about the strike, and that the figures are given 
while the facts are fresh in the mind of the informant, whereas 
at the end of a long struggle, they might have been forgotten. 
Should the second part not be forwarded, the Department would 
of course write for it, or send a duplicate. 

Question i on the earlier form is modified and split into two 
on the second. Question 2 on the later is simpler than the 
parenthesis of question i on the earlier, but asks for the more 
important information as to employers' associations, which will 
lead to the blank schedule being sent to the addresses given. 

Question 2 of 1895, 3 of 1897, is the same on all four forms 
(the two to trade unions, and the companion forms to employers); 
it is not a statistical question, and probably leads to vagueness 
and to contradictory statements on the part of employers and 
employed ; but the new parenthesis (" enclose copy, &c.") is 
important, for it leads to definite statements about which there 
can be no dispute. 

The next question on the earlier form had of course to be 
altered for the new double sheet. Since the chief statistical 
information needed relates to the exact number of days' work 
lost, it is necessary to know exactly the date of the commence- 
ment of idleness ; this day is therefore very carefully defined on 
the later form, as not that on which notices were sent in or any 
preliminary steps taken, but that of the actual commencement of 
hostilities. Question 6 (date of termination) is also carefully 
worded. The date of notices (question 4) gives useful sub- 
sidiary information. 

There is considerable difference between question 5 and 5^: 



6o ELEMENTS OF STATISTICS. 

on the later and the corresponding question 7 on the earlier. 

The only difficulty in using the information 

"* ft «o , q;)j^^i^q^ from the new form arises at this point 
The number affected by a strike, especially the number indirectly 
affected, changes continually, rising gradually to a maximum and 
then rapidly decreasing as the dispute draws to a close. The 
1895 form did not give enough information, for the numbers at 
intermediate dates cannot be deduced from the numbers at the 
beginning and at the end, so that we have not the necessary 
data for determining what we chiefly want, the number of days' 
work lost {t.e.y the sum of the numbers of days lost by each 
person affected). In the case of a long dispute, however, this 
information is revised monthly at least, as is shown by the 
monthly report in the Labour Gazette, 

The chief improvement in the new form consists in 
allotting separate spaces for different occupations. Several 

The spreading classes of workpeople wiU probably be affected in 
of a Btaike. different ways by a strike in a complicated in- 
dustry. Thus if the cotton spinners are on strike, very likely 
the carders will go out either on a grievance of their own or 
from sympathy. The spinners' assistants, the piecers, are at 
once thrown out of work, as are also the overlookers of the 
mules. As the strike continues all the departments of the 
spinning mill will be closed, one after the other. In the form 
four lines are allowed for those directly affected, eight for other 
classes unwillingly on strike. A great dispute, however, is not 
limited in its effect to the spinning mills. The supply of yarn 
falls off and the weavers are stopped ; then the export trade is 
diminished, and dock labourers and sailors are thrown out of 
work, and so the influence of the strike spreads. It is out of 
the question to estimate completely these indirect effects ; but 
in order to trace them as far as possible, space is given on the 
second form for the address of any one who can give information 
about them. 

Question 6 on the earlier form, " Mode of settlement ? " has 
been expanded considerably in the later one, since the question 
cannot well be answered in a single word, and the exact details 
are important for the non - statistical part of the inquiry. 
Question 5 in the earlier has also been altered ; the important 
request for printed agreements is added, and the parenthetical 
part has been grouped with question 8 so as to form the new 



THE WORK OF THE LABOUR DEPARTMENT. 6l 

question 9. This alteration, the same on employers* and work- 
men's forms, is worth special attention. The distinction between 
" directly " and " indirectly afifected " is practically useless, and 
difficult to maintain in filling up the form. It is far more im- 
portant to distinguish the different classes of workpeople, as can 
be done in the nine lines of the new form. Again, it is difficult 
to state the "total wages before and after," and the question 
leads to inaccuracies ; the new question 9 is far better, for it 
IS precisely that easiest to fill in, and most useful when done, 
and is in the exact form wanted by the Department for its 
register of changes of wages and hours. It is important in 
this question 9 to include all, whether on strike or not ; hence 
we have the italics in the heading and a footnote to the second 
column. This footnote could be improved, for at present the 
wording is a little obscure, and the notice might be put with 
advantage in the heading. 

There remains a series of questions which have been dropped 
out in the later form. It may be supposed that it was found 

that the answers were not accurately given, that 
eo ry oss. ^j^^ inclusion of the questions overloaded the form, 
and by tending to inaccuracy in the answers led to inaccuracy in 
other details ; while in cases when it was possible to obtain 
correct answers, it was found best to do so by other methods or 
a separate inquiry. There are two sets of questions : trade 
unionists are asked the amount they spent ; employers the value 
of capital left idle. 

Question 3 on the older trade-unionist form has nothing to 
do with the statistical inquiry. Question 9 simply affects the 
relation of unionists to non-unionists, may lead to exaggerated 
answers, is not wanted for tabulation, and is apart from the main 
inquiry. Question 10 belongs properly to a separate inquiry ; the 
total might not be known to the secretary who fills in the form, 
and the amount expended by unions on " strike benefit " is com- 
piled annually from other sources. The question is too compli- 
cated to be placed with advantage at the end of a long form. 
Question 11 is too vague to lead to the information wanted, 
though knowledge of the facts is needed. Question 12 is hardly 
likely to yield any results worth having, since all possible means 
have long been canvassed. The answers are, however, tabulated 
in the Report on Strikes of 1 894 (C. — 7901 ), which may be studied 
with advantage in connection with these forms, and some of 



62 ELEMENTS OF STATISTICS. 

them may be quoted : — " Give in to the wants of the men, so 
that they are not extraordinary." " Abolish capitalism." " No 
means have yet been discovered." " Make all men Christian." 
" Fair argument." " A little more common honesty on the part 
of employers " (pp. 229-240). 

The questions omitted in the later, but present in the earlier 
employers' forms are subject to similar criticisms. The sub- 
division of question 9, distinguishing summer and winter wages, 
and the separate columns for hours, are only in the later form. 
A comparison of these two forms with any number of the Labour 
Gazette and the Annual Report on Strikes and Lock-outs of 1894 
will throw considerable light on the uses and difficulties of 
forms of inquiry. 



ENGLAND'S FOREIGN TKADE. 63 



Section 4. — Statistics of England's Foreign Trade. 

The original schedules which lead to many other statistics 
are interesting, but limits of space must restrict us to one more 
typical inquiry, that which leads to our statistics of foreign 
trade. 

In the population census the filling in of the form is com- 
pulsory and done by the householder ; in the wage census 
the answers were voluntary and given once and for all by the 
employer ; in the various inquiries undertaken by the Labour 
Department the answers are voluntary, but in many cases 
periodic, so as to become quasi-official. The method of collec- 
tion of import and export statistics is a blend of all these. 

There are three classes of persons who know the 

The infonuaiits. 

facts in question — the sender of the goods, the 
custom-house official through whose hands they pass, and the 
recipient or his agent. Circumstances decide that, in the case of 
exports from the United Kingdom, the exporter or his agent 
sends an account of the quantity and value of goods de- 
spatched to the Statistical Department of Customs ; that, in 
the case of imports, the receiving-agent hands over an account 
of goods to be landed to the custom-house officials, who verify 
the account, roughly if the goods are duty free, carefully if they 
are liable to duty ; and that, in the case of transhipment, the 
goods are treated in the same way as imports at the port of 
landing, and to some extent verified at the port of embarkation. 

The blank forms, being filled in by officials as part of 
their duty, or by agents thoroughly used to the task, need no 
covering letter, and may be made as complicated as necessary ; 
no questions are inserted but only blank tables. An examina- 
tion of the forms in use will show what are included as exports 
and imports in the Board of Trade totals, and what is the total 
amount of information available for tabulation. 

The quantities we wish tQ measure in this investigation are : 

the volume or weight and value of all goods which have an 

ThequsMita exchange value, which leave our shores or reach 

and data. them from without, subdivided as regards classes 
of commodities and countries of destination or origin ; the values 



64 ELEMENTS OF STATISTICS. 

being those at the times of loading or unloading. The quanti- 
ties we can measure are sharply distinct from these, being the 
records of values and volumes which reach the Board of Trade. 
We should therefore examine the forms to decide — (i) What 
part of imports and exports are recorded ; (2) whether the values 
are correctly given, (3) the quantities accurately registered, 
(4) the commodities accurately defined, (5) the countries of 
origin and destination accurately distinguished in the returns. 



ENGLAND'S FOREIGN TRADE. 



65 



On reaching port the ship's master has to send in an 
ExampieBof account, of which the following is an abridged 
specimen : — 



information. 



If Sailing Vessel 
or Steamer 



No. I. 
Port of X. 



STEAMER. 

REPORT No. 980.* 



Official No. 
No. of Register, 
Date of Registry, 



Ship's Name. 


Tonnage. 


British or Foreign. 

If British, Port of 

Registry; if Foreign, 

Country to which she 

belongs. 


Number of Crew. 


Name of Master, 

and whether a 

British or Foreign 

Subject. 


Port or 
Place from 


British 
Seamen. 


Foreign 
Seamen. 


whence 
arrived. 


Marianne. 


700 


BRITISH. 

Total.. 


12 


— 


H. Hind. 


Havre, 
France. 









Cargo. 



X. 

Name or 

Names of 

Places where 

laden in order 

of time. 



Havre, 
France. 



If any wreck 
fallen in with 
or picked up, 
to be stated. 



2. 


3. 


Marks 


Nos. 


Pari 


s to 


COK 
AE 
KG 

FOT 


1392 
495/6 
340/9 

1/50 


AJ 
CK 


3/6 

I 


AC 


10 


KL 
ACD 


40 
20 


WD 


166 


O&D 


I 



Packages and Description 

of Goods, Particulars of 

Goods stowed loose, and 

General Denomination of 

Contents of each Package 

of Tobacco, Cigars, or 

Snuff intended to be 

imported at tliis Port. 



Particulars of 

Packages and 

Goods (if any) for 

any other Port in 

the United 

Kingdom. 






6. 

Goods (if any) to 

be Transhipped 

or to remain on 

Board for 

Exportation. 



London. — 600 pkgs. Fruit and Perishables. 
68 pkgs. Merchan'dise. 



70 cases Wine. 



5 cases Woollens in transit to Liver 
I case Brandy. 



Name of 
Consignee. 



Smith. 



pool. 



»» 



Stores. 



Surplus Stores remaining on board, viz. -j ^ ,, * Tr^^cco. 

Number of Alien Passengers (if any) - Nil. 

Pilot's Names 

At what Station Ship lying - - - South Quay. 

Agent's Name and Address - - - C. J. C. 

I declare that the above is a just report of my Ship and of her Lading, and thai 
the Particulars therein inserted are true to the best of my knowledp;e, and that I have 
not broken Bulk or delivered any Goods out of my said ship since her departure from 
Havre, the last Foreign Place of Loading. 

(Signed) H. Hli^D, Master. 
Signed and declared this 13th day of October 1896 
In presence of 

(Countersigned) 

Collector. 

^P^^"i^^^— ^" ■ I Ml — P«^l«^»^^^^^ M ■ "^ PI ■■■I —»»■ .^ .■■■■■I. ■^— - — I ■ ■ ■■ ■ I ■ ^ * IP— ^^^ 

* i.e.i 980th ship at X. since ist January. 

E 



66 ELEMENTS OF STATISTICS. 

The goods for quick transit are passed at once, and a special 
form is sent to the Customs Establishment similar in character 

to that on p. 6y, The remaining goods are treated 
* * ^ "* either as dutiable or as duty-free articles. In the 
list before us, ten cases of wine are entered for home use, and an 
account is sent in to the Statistical Office ; sixty cases are ware- 
housed and another account (as to quality, quantity, and value) is 
sent in ; the whole are registered as imports. Twenty of the ware- 
housed cases are removed to another port and re-exported ; an 
account is sent, and they are entered as exports of foreign goods. 
Twenty are put on board ship as stores at the original port, and 
twenty more removed to another port for the same purpose, and 
of this the central office takes no account ; the remainder are 
removed to another warehouse, still in bond, and on leaving that 
will be treated in one of the four ways just mentioned. Other 
dutiable articles are treated in the same way. 

Goods not sufficiently described or not answering to their 
description are opened, their contents entered on a "bill of 
Examination of sight," and an account sent in. Private effects are 
goods. separately examined, being described on a " suffer- 
ance " form ; if they are bona-fide personal goods no record is 
kept of them, except in the case of dutiable goods, which are 
treated as ordinary imports. If the dutiable goods are con- 
cealed, either among private effects or merchandise, and forfeited, 
they are not reckoned as imports. 

Bullion is entered on a separate form and kept distinct 
throughout the accounts. 

The duty-free goods, if for transhipment at another port, are 
sent there under seal, and barely examined ; they are treated at 

_ ^ the central office in the same way as dutiable 

Free goods. ^ 

transfer goods. The remaining free goods, which 
in general form the bulk of the cargo, are entered on such a form 
as follows, which is worth notice, for it is a specimen of the 
rough material from which our foreign trade figures are 
evaluated. 



ENGLAND'S FOREIGN TRADE. 



67 



ENTRY FOR FREE GOODS. 



This space 

is for the 

use of the 

■ Officers of 

Customs. 



Examina- 
tion. 



Port. 



Dock or Station 



Importer's Name. 



(No.. 



-) 



Ship's Name. 
Marianne. 



Master's Name 
H. Hind. 



Rotation No. 
980. 



Date of Report 
13/10/96. 



Port or Place whence 
Havre, France. 



Marks and 
Nos. 



COK 1392 

AF 495/6 
KG 340/9 



FOT i/io 

»> 11/5 
„ 16/20 



» 21/5 

„ 26/30 

.» 31/S 

>t 36/40 



)i 



41/50 



AJ3/6 
CKi 



No. of Packages and Description of Goods, 
in accordance with the Official Import List. 



Quantity. I V*]-"*. 



One Goods Manuf. N.O.E. Billiard 

Cue Tips - - 
Two Leather Shoes - 
Ten Cotton Manuf. Trimmings 

Embroideries 
Piece Goods, not Muslins - 
Ten Gloves of Leather 
Five Silk Broad Stuffs 
Five Works of Art- 
Plaster Casts 

Statuary 

Pictures by Hand 
Five Books Bound 
Five Bronze Manuf. Ornaments 
Five Metal Manuf. Ornamental 

Brass-headed Nails 
Five Silk Manuf. Dresses, Mantles 

Trimmings - 
Ten Goods Manuf. N.O.E.— 

Fancy Goods 

Horseless Carriage 

Brushes 

Glue - - . . 

Billiard Chalk • 

Hardware - 
Four Stationery Ink - 
One Iron and Steel Manuf. 

Machinery, British, returned 



10 doz. prs. 



300 yds. 
11,240 doz. pr. 



3 
4 cwt. 

3 cwt. 

4 cwt. 



3 cwt. 



£. 



28 

58 
140 

280 

8 

12,316 

10,400 

380 

1,280 

10,200 

300 

38 

24 
1,816 

no 
160 

78 

no 

12 

116 

48 

24 



I enter the above goods as free of duty, and declare 
the above particulars to be true. 

Dated this 13th day of October 1896. 

(Signed) J. Jones, 

Importer or his Agent. 

The information so received is usually accepted at the 

central office without inquiry. It frequently happens, however, 

that the form is not properly filled in by the agent, the values 

Verification of often being omitted. When this is so, it is the 

data. duty of the clerk at the port of entry to fill in the 

value, in accordance with a list of current prices with which he is 



68 ELEMENTS OF STATISTICS. 

provided. It may happen that he has to appraise the goods on 
inspection, a process leading in some cases to great error, which 
is enhanced when not even the quantities are given. When 
there is a palpable error or omission in the form, or when the 
price appears out of the common, a query is sent from the 
central office to the port : e,g,^ with reference to such a form as 
that just given, the following correspondence might arise : — 

1. Pictures by hand, ;£'io,200. Explafn high value. Answer. 
— Correct ; invoice was seen ; pictures by Millet. 

2. Books bound: is weight or value incorrect? Answer, — 
Both correct ; advice seen ; old and valuable books. 

3. Goods entered as " goods manufactured, chip plaiting " : 
explain nature, and state if description is correct. Answer — 
Correct ; wood shaving plaited and occasionally mingled with 
horse-hair, &c. 

4. Potatoes, 40 cwt, £62, Weight or value? Answer — 
Value correct. Weight should be 400 cwt 

Thus any unusual entries are liable to be checked and 
verified. 

In the case of goods not easily valued, or of miscellaneous 
goods not easily tabulated, errors must arise in this way ; and 
PoBBibiuty of another error may enter if a clerk, who does not 
errors. ^Jsh to receive too many queries from head- 
quarters, enters at ordinary rates goods of exceptional value ; 
but when staple commodities and large quantities are involved, 
all the persons concerned will be familiar with the forms they 
have to fill, the prices will be known, and so in important cases 
errors will be at a minimum. The import total values, there- 
fore, are the sum of many quantities of various degrees of 
accuracy, and it is not difficult when looking through the list of 
items in the annual report to see which are specially liable to 
error. Such commodities as old books, works of art, goods 
where sale depends on the fluctuations of fashion, racehorses, 
and so on, have values varying from day to day, and their 
exact value in the balance of imports and exports cannot be 
determined. 

The quantities and values of exported goods are filled in by 
the shipper or agent, and sent to the central office 

ExDorts 

within six days of the ship's clearing. The follow- 
ing is aln abridgment of the form used : — 



ENGLAND'S FOREIGN TRADE. 



i « 



II 



1 &I 



1 3 



I 



70 V ELEMENTS OF STATISTICS. 

The forms for British and Irish goods are distinct from those 
for foreign, free and duty-paid, goods; and there are distinct 
export forms for transhipments, which have already been regis- 
tered as imports. In these cases the specification and quantities 
are likely to be correct, but there are causes which may falsify 
the values. If they are to be subject to an ad valorem duty, they 
may be undervalued ; if they are adulterated goods, masquerad- 
ing as genuine, they may be over-valued. It seems hardly 
possible to estimate these errors. 

We are now in a position to define imports and exports 

Definition of according to their meaning in the Board of Trade 

offioiAiimporu Returns; as, for instance, when for 1895 the value 

expo . ^£ jj^pQj.^g jg stated as ;£'4 16,000,000, and of 

exports as ;^28s,ooo,ooo, of which ;£'6o,ooo,ooo are re-exports 
of foreign or colonial goods. 

This total for imports includes all goods landed through the 
custom-houses, including goods immediately shipped as stores, 
or returned from customers unused. Goods immediately re- 
shipped at the same or another port, or held in bond and then 
re-shipped, are included both as imports and exports. Bullion is 
not included, being given separately, nor cargo unlanded and 
so reported, nor personal luggage or private effects, except when 
duty is charged. The value reckoned is the nominal exchange 
value when or just before they are landed ; that is, their value is 
already increased by freight, but not increased by duty. 

The total for exports includes all goods entered on ships* bills of 
lading, does not include ships' stores or passengers' luggage, nor 
cargo unlanded and so reported, nor bullion, which is given separ- 
ately. The value is reckoned at the time they are put on board. 
Ships leaving our shores to be sold to foreigners are now included. 

The treatment of coal throws light on this paragraph. Coal 
taken for use on the voyage is registered, but not included 
among exports ; coal as cargo is included. 

Among exports not registered are cash taken privately and 
personal effects ; among imports not registered are smuggled 
goods, and cash and personal effects. 

For the causes and extent of the resulting differences between 
imports and exports. Sir R. Giffen's two papers * on the subject 
should be consulted. 

♦ Essays in Finance^ Second Series ; dXiA Journal of the Royal Statistical 
Society^ 1899. 



CHAPTER IV. 
TABULATION. 



CHAPTER IV. 

TABULATION. 

Leaving now the consideration of blank forms of inquiry, let 
us turn to the methods by which our data, accumulated on these 
forms, can be tabulated. At first sight the tabulation of so many 
million census forms, so many schedules of wages, and so many 
lists of goods imported, seems mere office work, to be done 
mechanically,* only requiring accuracy and not subject to 
scientific analysis. Tabulation does, indeed, involve a great 
deal of automatic labour ; but the determination of the exact 
form of the table and the choice of the headings to which the 
totals shall correspond task the administrative statistician, and 
are worth the closest study. 

The function of tabulation in the general scheme of a statistical 
investigation is sufficiently definite ; it is to arrange in easily 
The funotion of accessible form the answers to those questions 
tabulation. y^WSx which the investigation is concerned. If it 
is required to know, for instance, the number of persons of each 
sex and age-group in all the districts of the country, the figures 
in the table must show these numbers. Or, to take a less definite 
problem, we want all the information possible as to labour dis- 
putes. In studying the forms issued by the Labour Department, 
we have seen that the information which can be obtained is not 
precisely that which we require. The problem then is so to 
tabulate our information that our totals may give answers as 
near to our requirements as possible, and it can easily be 
found by experiment that the way to do this is by no means 
obvious. 

Not only must the figures be grouped so as to answer the 
questions put forward in the original scheme, but if the in- 
formation is of wide and varied interest, as in all the inves- 

* An account of Mr Hollerith's electrical tabulating machine, used in the 
Xlth Census of the U.S.A., will be found in Dr Bertillon*s Cours EUmentaire^ 
p. 579 seq. 



74 ELEMENTS OF STATISTICS. 

ligations so far considered, the data must be studied from many 
points of view, and tabulated so that students in all branches of 
knowledge may be able to extract from our tables the infor- 
mation they require. Thus the population census is used by 
the financier, the legislator, the merchant, and the commercial 
traveller ; political economists turn to it for light on the de- 
velopment of industry, and on the change of numbers in 
each trade; those interested in social questions will study 
the ages and sex-distribution in various districts or occu- 
pations ; the sociologist and biologist will need accurate infor- 
mation as to the growth of population and the change of age 
distribution. 

To take more specific points, the blue-book which con- 
tains the tabulation of foreign trade statistics will be ex- 
pected to show how our trade with each country is de- 
veloping, whether we are holding or improving our position in 
certain markets ; whether we are exhausting our supply of raw 
materials ; whether some new commodity is yet of importance. 
It must be remembered that the original material is not 
accessible to the public, that they are dependent on the 
information extracted for them, and that, though it would be 
possible to turn through all the forms for special data, yet the 
labour needed would be prohibitive, while a little more detail 
in the tabulation might easily have isolated the information 
needed. 

For convenience, the methods of tabulation may be* divided 
into three groups : A. The simple statement of totals of persons 
Three groups of or things which satisfy given conditions, such as 
tabulations. ^}^g number living in a town, or the total value of 
imports from France ; B. The grouping of a great number 
of units in relation to some particular property possessed 
by all, with the object, not of answering assigned questions, but 
of putting the material in a form ready for use in further investi- 
gations — e,g,^ the population according to ages, or wage-earners 
according to the value of their wages ; C. The tabulation of 
non-numerical answers in suitable groups to give a view of the 
whole — e.g,^ the causes of strikes or the state of employment. 
The division between groups A and B is not always definite. 

In the tabulation the convenience of the reader must be 
studied. The table must be so arranged that any totals required 
can instantly be found. This is to a great extent a question of 



TABULATION. 



75 



typography, the use of suitable founts for figures and headings, 
and also of the choice of the right shape and size of page. 
Supposing the best possible choice made in these respects, our 
rule will then be to get the maximum amount of information into 
the minimum space. 



Group A. — Thus we can have single tabulation, answer- 
oiaises of tabn- i^g one or more groups of independent ques- 
tions, as : — 



lation. 



Number and Membership of Trade Unions.* 



Year. 


Number of Trade 

Unions at end of 

Year. 


Total Membership of these 
Unions at end of Year. 


1896 

1897 
1898 


1,317 
1,307 
1,267 


I,493i37S 
1,611,384 

1,644,591 



Double tabulation shows the subdivision of a total according 
to two categories, in the following example according to sex and 
age :— 



Classification of Paupers in Ireland. — Total Numbers who 
received Relief during the Year ended Lady Day 1892. t 



Ages of Persons Relieved. 


Males. 


Females. 


Total. 


Under 16 years ... 
Of 16 and under 65 years 
Of 65 years and upwards 


44,391 

132,370 

35,121 


43,648 

79,045 
45,668 


88,039 

211,415 

80,789 


All ages . • . 


211,882 


168,861 


380,243 



• Compiled from the Sixth Annual Abstract of Labour Statistics, p. I. 
t Ibid,f p. 102. 



je 



ELEMENTS OF STATISTICS. 



More information may be included thus : — 



Classification of Paupers in England and Wales. — Total 
Numbers who received Relief during the Year ended Lady Day 
1892.* 



Ages of Persons Relieved. 


Indoor. 


Outdoor. 


Total. 


Metro* 
polls. 


Other Parts 
of England 
and Wales. 


Under 16 years - 

Of 16 and under 65 years 

Of 65 years and upwards 


111,782 
232,284 
114,144 


441,805 

385,299 
287,760 


553,587 
617,583 
410,904 


100,671 
148,066 

64,779 


452,916 
469*517 

337,125 


All ages - 


458,210 


1,114,864 


i»573»074 


313,516 


1,259,558 



A TREBLE tabulation can be used, subdividing the total into 
three distinct categories, with cross totals for each group. Thus 
the following table gives separate divisions according to age, 
sex, and district ; percentage lines, in a distinct type, are also 
introduced : — 



♦ ibid.^ p, ioi» 



TABULATION. 



77 



o 



ON 
00 



Q 






>< 

H 


c 


^ 


M 


< 


a 


u 


<u 


o 




Q 


■M 


2 

< 


bX) 




• rN 


*^ 


»^ 


X 


s 


w 


nj 


C/) 






(/) 




d) 


Ed 

o 


1 


> 


G 


»} 


-O 


P^ 


c 


M 


a 


Oil 


r-H 


< 





<u 



z -^ 



o 


<U 




^ 


ri 


-o 


u 


a> 


N^ 
b 


> 


CO 

< 


(D 


•-I 




u 


o 




^ 




xn 




Im 




<u 




^ 




s 




p 




;z; 




__i 




cS 




4^ 




o 




H 



CO 

O 
H 



H 

•J 

Q 

Q 

< 

•J 
O 



Q 

< 

< 

•J 
O 

o 4: 

CA ^ 

H 
S 
H 

o 



.J 
o 

0. 

o 

K 

S 



I 



9Se)a33J3£ 



s, 



O 









I 



CO 



'A 
CO 

o 



SOD 



•dSc q3t» Its 


- • 


• 


5 


• 


1 


9Sv|U93i9£ 


^ 


S; 


1? 


• 




^ 


eo 


^ 




^ 


t 

^ 


% 




% 

M 

1 


1 




• 




CO 


-^ 








0) 


t^ 


"? 




•a 


• 


tH 


©» 


■ 


B 


• 
• 


1-1 


GO 


s; 


• 
• 


« 




^ 


OO 




1X4 




CO 


C^ 










r^ 


Q 






^ 


• 


00 
GO 


§ 


K 


• 


•a 


• 
• 




•k 
CO 
CO 


i 


• 
• 


-oSb q3e9 IB 


• 


? 


• 


• 


8 

S 


9SeiU93J9J 


fq 


s^ 


-§ 


• 




CO 


t» 


to 




s, 




f^ 


*H 


G4 




1 


9> 


0^ 


1-1 


1 


ft 


H 


»o 


CO 


CO 


S 


10 


•* 


•* 


CO 




N 












M 


• 




t-s 


CO 






8 


• 






>o 




1 


• 
• 




•% 


^ 


• 
• 



CO 



i 



^ 



CO 



!^ 



s 



M 

in 

CO 

M 

CO 



s! 


ON 


M 






V 


' »n 


•^ 


o 




01 


00 


m 


m. • 


• 


g 


ro 


In. 


■5, 


• 


b 


c^ 


fO 








fs 


00 








o 


CO 


o 




I N 


M 


. • 


• 








!i 


• 







w 








1! 


1 


pi. 


OS 


CA 

2 


• 


.S 




m 


^ , 


w 




•3 




NO 


cu ' 


'B 




p< 


+- 


>* 







i 

1 


V* 


4J 


•T5 


^ 




Si. 

M 


73 

C 




65 years an< 
Number 


♦J 

(1| 


1 

<: 


a 


iM 


<<^ 




F-^ 


P 


o 







< 



^0 



In 

c 
:3 

X 
V 

CO 

00 

c 
•3 

.a 

-o 

o 
o 

(A 

fi 
22 

V 

H 






78 



ELEMENTS OF STATISTICS. 



The same process can be further extended : the example 
in the table opposite shows an arrangement for a QUADRUPLE 
tabulation, distribution by district, date, sex, and occupation, 
with subsidiary information ; but it is generally better to use 
two or more tables than to increase the complication, unless 
it is necessary to bring several categories into close relation. 
"Suitable varieties of type will often make comparisons easy in 
a very complex table. 

Looking now at the census householders' schedule (p. 23), 
it will be seen that there are about twelve different items 
Tabulation of of information about each person : county, town, 
census material, parish, position in family, civil condition, sex, age, 
occupation, industrial position, infirmity, birthplace, and house- 
room. These could be tabulated in 66 different single, 220 
double, or 495 treble tabulations, so that there is plenty of 
scope for choice. 

To fix our ideas, we will take occupation as the main sub- 
Mr Booth's division, and examine Mr Booth's use of the census 
tabulation, returns, say for London Printers.* 
First he gives a treble classification — occupation, sex, and 
age — using columns 4, 5, and 6 of the schedule. 



Census Divisions, 1891. 


Females. 


Males. 


Total. 


All Ages. 


•19. 


20-54. 


55- 


1. Printer - 

2. Lithographer, &c. - 


1,316 
809 


9,988 

757 


21,784 
3,037 


1,921 

437 


36,009 
6,040 


Total - 


2,126 


10,745 


24,821 


2,358 


40,049 



Then follows a single table, district and numbers, using the 
information on the back of the schedule. 



Distribution. 



E. 


N. 


W. &C 


s. 


Total. 


5,884 


9,835 


7,577 


16,753 


40,049 



* Lt/e and Labour of the People,, vol. vi., p. 189. 



tD Wales and Scotland in 1881 and 1891. 



1 






TOTALS. 






IMEN. 


cx»*s omitted. 




189I. 


1881. 


1891. 


i 


Numbers 
employed 

ooo's 
omitted 


Per 

lOtOOO 

Femaies 
eU>ove 

lOyears 
o/age. 


Men. 


Women. 


All. 


Men. 


Women. 


All. 


1 

t 


13 


82 


188 


329 


517 


218 


345 


553 




18 


1^5 


107 


130 


246 


116 


150 


266 


1 


19 


120 


12 


28 


40 


10 


25 


35 




3 


^9 


210 


38 


248 


221 


49 


270 




544 


3A00 


8,845 


3.886 


12,731 


10,009 


4*489 


14,498 


t 

1 


1,699 




10,627 


",454 


22,081 


12,038 


13,060 


25,098 


\- 



















f 



To face page 78. 






I 

\ 






i 



TABULATION. 



79 



Three simple tables are then given, relating to heads of 
families, using columns 2 and 4 (sex), 2 and 10 (birthplace), 
and 2, and 7, 8, 9 (industrial status). 

His next table uses columns 2 and 6, and is as follows : — 







Total Population Concerned. 


. 


Heads of 
Families. 


Others 
Occupied. 


Unoccupied. 


Servants. 


Total. 


Total . 


18,048 


16,060 


47,257 


854 


82,219 


Average in Family - 


I 


.89 


2.62 


.OS 


4*56 



e next table (not here given) is a single classification 
according to number of rooms and servants, a most ingenious 
indirect use of the scheduled information ; and the last is an 
example of the legitimate use of a quadruple tabulation — 
occupation, industrial status, sex, and age — given on the next 
page. 



\ 



80 



ELEMENTS OF STATISTICS. 



c 
.2 

a; 

B 

c 

3 
c 

U 



tiX) 

o 

H 

> 
O 
•-) 
Pi 

o 

H 

CO 
CO 

H 
CO 



IS 






5P 


IS. 





s> 


9. 


10 


•» 




« 


to 




M 



3 
9 




H 

O 

•J 

S 



-a 



OS 



c« 



fO 



? 



M 



*>. 


1^ 


r>> 


^0 


r% 


M 


t^ 


1^ 


to 


*o 


00 


IH 










/ M 







0) 








- 


♦J 




. 






C 


*M 








• ^^ 


0) 








b4 


p^ 








pL. 


t— 1 






1 


<u 


(/) 


I 






■♦-• 










OS 


73 






• 


a, 

• 


C4 


W3 
1,4 


00 




0) 


ll 


V 






lU 


^ 


4-» 






4-1 


Im 


•-^ 


(0 




C/3 


S 







I 


73 





^4 

(0 




5 


d 


15 


> 




•% 




^ 


»i4 


■ 




a 
'V, 


^ 


to 







q: 


'a 


a 




U 

• 






4-1 

Js4 




• rN 


■4-* 


a 







fi 


;3__ 


s 


^ 




• 




c4 





o 



( 



N 



TABULATION. 8 1 

It would be difficult to find a better example of tabulation 
of a great multitude of details to serve a special purpose. The 
The oensiiB census authorities had in many cases not tabulated 
tamuationg. ^hg necessary details, and it was necessary to turn 
through the original schedules to get at the facts. For such 
work as this, the function of tabulation is simply to provide 
the answers to definite questions. Thus the census reports 
show how many persons of each sex and age-group belong 
to certain industries in certain places, in a quadruple tabulation 
extending over many pages, each page relating to one district, 
and this table may be used for accomplishing many separate 
purposes : each item is already a total ready for use. It is 
impracticable from limits of time and space, even if it were 
desirable, to tabulate all the possible groups of qualities which 
can be made from the twelve statements on each census form ; 
a good tabulation will aim at providing only those statements 
which are of practical use. Thus many simply descriptive totals 
are given, such as the numbers of each sex and age in each parish 
in the United Kingdom, to serve primarily for administrative 
purposes ; and many statements which will afford the economist 
and sociologist the opportunity of tracing the progress of in 
dustries, of studying the ages of workpeople in different occu- 
pations, the changes in age-grouping of the nation ; and some 
further tables might be given to throw light on problems of 
cause and effect, such as the average ages in town and country, 
the connection between infirmities and occupation, or the ages 
of marriage in various districts or industries. 

It is interesting to open one of these great tables of figures, 
such as are generally to be found forming the bulk of a blue- 

book, and taking a figure at random, ask "Why 
is this figure printed, what question does it answer, 
to whom can it give information ? " For instance, in the Eighth 
Report on Trade Unions^ p. 257, we find that the United 
Brickworkers' and Brick Wharf Labourers* Union spent ;^20 
on funeral expenses in 1894, an average of 3s. 7^d. per member. 
As an isolated statement this may interest a very small number 
of persons ; but that small number has a right to expect 
that they shall find the figures relating to their union tabulated 
in a general official book ; to them it may be as important as 
the item, on the same page, of ;^S,48i spent by the Boiler- 
makers. From this point of view, the question of inclusion of 



82 



ELEMENTS OK STATISTICS. 



such small items is simply one of space. If space is limited, 
a selection would be made of larger quantities only, as being 
likely to concern more people. 

But there is a reason of quite another character for printing 
such items as these. The raw material, on which the totals 
Importance of I" such tables are based, is not accessible to the 
raw material, student except by means of this Report. Now, the 
compiler of these statistics cannot know from what particular 
point of view they will be studied. It may be desired to 
examine and group trade unions according to their expendi- 
ture on different items, to study their history, classifying them 
as fighting organisms and as friendly societies. The tabula- 
tions needed cannot well be foretold. The material is there- 
fore given in the rough, in order that the tabulation may be 
made by each student according to his needs. At the same 
time the most suggestive totals are given as one of these 
possible methods of tabulation ; and in the summary of such 
a report, the items are retabulated, the rough material being 
omitted, in those ways which the editor thinks most useful. 

When space is much too limited for any publication in 

extenso of the items, a careful selection must be made of those 

seieotion of to be printed ; and it is this selection that is 

raw material, generally open to most criticism. Owing to the 

great admiration for uniformity generally to be found in the 

official mind, valuable space is wasted on such statements as * — 



COVENTRY : 



I89I 



Shipwright: Ship, Barge, &c., Builder (Wood) 

(Iron) 



>i 



)) 



a 



)» 



MALES. 



I 
O 



while all the males — masters, traders, skilled workmen, labourers, 
errand-boys — engaged in the cycle trade in Coventry are in- 
cluded in — 




In such cases, two useful rules might be applied : omit all 
numbers under, say, 500 when by so doing a line of print 
would be saved; and give all numbers over 10,000 correctly only 
to the nearest 100, and so for other digits in proportion, thereby 



* Census Report. 



TABULATION 



83 



reducing the width of columns of print If, for example, we 
knew to the nearest 100 the exact numbers in each district 
Economy of and occupation in which as many as 1,000 were 
space. employed, our knowledge would be as com- 
plete as we needed ; and it is doubtful whether the space 
occupied by this tabulation would be more than that already 
devoted to the subject. In many cases, on the other hand, it 
is essential to have the raw material quite unchanged. Each 
tabulation must be judged on its own merits. 

It may be useful to take a particular group of answers, and 
discuss what tabulations will throw most light on the questions 

Tabulation of the ^t issue. The Poor Law Commissioners of 1833 
Poor Law collected information from a thousand villages in 
Returns, 1838. Ej^gi^nd and Wales on the following six points 
among others : the wages of an agricultural labourer in summer 
and in winter, both with and witiiout the inclusion of beer as 
part payment, his annual earnings, and the subsidiary earnings 
of his wife and children. It may be supposed that the chief 
object of the Commissioners was to find whether the labourers' 
families earned enough for their support, and what proportion 
was earned by the wives and children. 

The following scheme of tabulation would show in what 
counties the labourer was badly off: — 



County. 


Average Annual Earnings of 


Man. 


Family. 


Together. 











The counties might be taken in alphabetical order for con- 
venience of reference, or in geographical order with subordinate 
averages for groups (e,g.y Eastern : Norfolk, Suffolk, Essex) ; or 
the counties might be arranged in the order of the total earn- 
ings, so that it could be seen at a glance in which counties the 
labourers were worst off. 

To show the number of villages, county by county, in which 
the earnings were below a certain minimum, or within certain 
limits, the following table might be used : — 



84 



ELEMENTS OF STATISTICS. 



Annual Earnings of Men and Families. 



Number of Villages in which th« Total Earnings averaged 


Average Earnings in 
of 


County 












2. 


14 


4 


• 


>, 






;Ca5- 




i?i 


^2 


^1 


^i 


1 


c4 




7 

§ 






^ ^ 


> c4 


^ (4 


?^ «< 


> c« 




bi 









5° 


is 


:2^ 


:^« 


is 


<J 






H 




o 


< c 
I 


3 


6 


4 


3 


2 






;^4» 


In Norfolk - 


£io 


;fll 


Percentages of 






















TotalNumber 






















of Villages • 





5 


i6 


j/4 


^/ 


/C5 


I0\s, 


73 


27 


• • • 


In Suffolk - 


o 


3 


4 


5 


3 


2 


2 


£2Z 


£^i 


;f39 


Percentages of 






















Total Number 






















of Villages - 


o 


i6 


21 


^<5 


i6 


/oi 


/oi 


72 


28 


> • 1 


In Essex 


I 


3 


6 


7 


lO 


3 


I 


£2% 


£\o 


;^38 


Percentages of 






















TotalNumber 






















of Villages - 
In Eastern 


3 


lO 


^9 


^3 


J^ 


/o 


J 


74 


26 


• • • 








Counties 


I 


7 


13 


i8 


17 


8 


5 


£iS ID 


£10 ID 


;^39 


Percentages of 






















TotalNumber 






















of Villages - 


/ 


10 


^9 


^d5 


^i* 


12 


7 


73 


27 


• • « 



This table can be used in the above complex form or simpli- 
fied. The number of subdivisions of money to be distinguished 
depends on the space at disposal and on the number of villages 
which would be entered in each. A table in which most of 
the entries are i or o is open to criticism. In the above table 
the villages are too few to allow accuracy in percentage. 

It will be seen that this table would furnish the answer to 
almost all questions which could be put as to total earnings. 
Tabulation to Fo^ instance, if we wish to see the relation between 
Biiow correlation. ^Q^-ai earnings and the family's subsidiary con- 
tribution, we should look at the smallest totals in the last 
column and see if they corresponded with the largest percentage 
of family earnings. If we found signs of correspondence we 
should re-arrange the counties in the order of these subsidiary 
percentages, and see if they were approximately in order of 
total earnings also. This is an example of tabulation to show 
correlation, the correspondence in the occurrence of two sets of 
phenomena. 

Another important group of questions arising in connection 



TABULATION. 



85 



With these tables is : What is the relation between weekly wages 
Wages and ^^^ annual earnings, and what proportion of the 
«*™^'««- wage is generally paid in kind? We shall not 
now require the statements as to subsidiary family earnings. In 
records of agricultural wages the most common statement is, e,g,^ 
"wages in this district are from los. to 12s. a week." Now, a 
farm labourer does not generally earn as much in winter as in 
summer, because wages are reduced to correspond to the smaller 
amount of work necessitated by failing light ; from this cause 
annual earnings will be less than the weekly wage multiplied by 52. 
Besides this wage he generally receives special money at hay and 
wheat harvests,and also many payments in kind, such as daily beer, 
house and ground at reduced rent, and other privileges. It is 
generally best to value all these, and compute his earnings thus: — 

I OS. for 38 weeks - £^1^ 



1 2s. for weeks 9 (summer) 
Hay harvest, i week 
Wheat harvest, 4 weeks . - 
Beer, is. per week - 
Cottage and ground 
Other perquisites 



5 
o 

5 

2 

5 
I 



o 
8 

o 

12 

o 

5 



o 
o 
o 
o 
o 
o 
o 



^39 o 0= 15s. per week. 

In this case earnings are 50 per cent, above the general 
weekly wage. An estimate of this nature has been made by the 
late Mr Little for each county for 1867-70 and 1892. We can 
tabulate the figures for 1833 in the same way for comparison, 
in geographical order and with the county as unit. We must 

first consider the question. Has beer been at all 
generally replaced by money? We can tabulate 
the figures as follows to answer this : — 



Boor. 



1833. 


1893. 


I. 

County. 


2. 
Average 
Summer 

and 
Winter 
Weekly 
Wages. 


3- 

Avenge 
Earnings 
per Week. 


4' 
Difference 


5- 

Number 

of Villages 

where 

Beer is 

given. 


6. 

Propor- 
tion to 
Total. 


7- 

Difference 

between 

Wage and 

Earnings. 


8. 

Excess of 
4 over 

7. 



















86 



ELEMENTS OF STATISTICS. 



In column 2 should be given the county average of the 
wages stated, without making any cash allowance when beer is 
given. Then if money has been replacing beer, we should find 
that in those counties where beer was most often given, wages 
had risen relatively to earnings more rapidly than in the 
counties where free beer was rare. Columns 4 and 7 show the 
differences for the two dates. When the entry in column 4 is 
greater than the corresponding entry in column 7, kind has 
been replaced by money. These excesses would be given in 
column 8. If money has replaced beer, the counties which 
have the greatest entries in column 6 should also figure high in 
column 8, and vice versa. 

The question. Are winter wages generally below summer 
Winter and wages, and by how much? can be answered by the 
snmmer wages, following scheme of tabulation, which uses the data 
not employed in the previous tables : — 



Counties. 



Average Weekly 
Wage in 



Summer. 



Norfolk 



f. d. 
II 2 



Winter. 



s. d. 
10 3 



Percentage of Number of Villages 
included 



Suffolk 



10 2 



9 8 



Percentage of Number of Villages 
included - . . . . 



Essex 



10 9 



9 10 



Percentage of Number of Villages 
included 



Eastern Counties 



10 6 



9 " 



Percentage of Number of Villages 
included 



Number of Villages where the Excess of 
Summer Wages over Winter was 



Nothing. 



13 

46 



24 
70 



22 



52 



59 
57 



6d. 



IS. 



// 



6 

18 



II 



26 



20 



7P 



IS. 6d. 



3 

5 



2S. 



5 

18 



12 



12 



12 



More 
than 2S. 



// 



10 



8 



These examples do not quite exhaust the useful tabulations 
of these groups of figures, for we have not yet examined the 
distribution of wages, that is the relative numbers paid at 



TABULATION. 87 

different rates. These returns do not, however, illustrate such a 
tabulation well, for we are not told the rates paid to individuals, 
but only the rate prevalent in the villages. 

Group B. — The grouping according to wages affords an 
example of the second method of tabulation. We have now 
no definite questions to answer, as in the method so far discussed, 
but a more general problem : given a mass of data, it is re- 
quired to tabulate it, so as to present the maximum amount of 
useful information. Our raw material is so many thousand 
isolated statements, which must be focussed, made to present 
definite meaning, and worked up so as to be useful for future 
comparison. 

Some investigations are undertaken not to answer any de- 
finite questions or to throw light on any given problem, but to 

statistics whose ^^^'^^^ information which, though it has no imme- 
purpose Is not diate use, is likely to be needed ultimately by many 
definite. investigators occupied with various questions. Such 
is a wage census. So long as we have no sufficient account of 
wages, we are badly informed as to one of the most important 
measurements of the social body, and economists and statisticians 
are continually hindered by the want of data essential for their 
work ; but the census has no immediate practical use, for knowing 
the height of wages does not help us directly to regulate that 
height. In such an investigation our object will be to examine 
the figures, and give all the groupings and averages which seem 
likely to be useful for any purpose ; and while doing this we 
shall imperceptibly pass to a different class of investigation ; 
we shall be finding a structure underlying our multifarious 
details ; we shall find that the chaos, which our figures present 
at first sight, obeys laws ; we shall be making a visible outline, 
and giving a definite shape to our apparently featureless mass. 

The complete discussion of this problem belongs to a later 
chapter ; but the tabulation can be begun without special 
technique. The examples taken will relate chiefly to wages, 
but the methods are quite general. 

In the American Report on Wholesale Prices, Wages and 
Transportation of 1 891, the wages of some 10,000 persons are 
detailed. It is proposed to consider their tabulation as a homo- 
seieotion of limits geneous group. The results are given on pp. 91-2. 

of groups. jjj ^ijg original publication the wages are given 

to half a cent ; in the second column, on p. 91, the numbers of 



88 ELEMENTS OF STATISTICS. 

wage-earners are given in lo-cent groups, from $.25 to $.34, $.35 
to $.44, and so on, those earning wages exactly at the dividing 
points being always placed in the division below. Notice that 
the average wage of such a group as $2.15 to $2.24 is not $2.20 
if the wage-earners are evenly distributed cent by cent, but the 
average of $2.15, $2.16, . . . $2.24, f>., $2,195. 

Looking at column 2, it will be seen that the figures present 
no order, follow no rule ; no structure has yet been found, our 
divisions are too narrow for our material. 

Now group the wage-earners with wider limits, as in column 6, 
where the numbers earning in half-dollar groups are given ; we 
have here a nearly regular sequence of numbers falling after the 
maximum in the second group. Going back to narrower limits, 
to find exactly at what divisions this regularity is first in evidence, 
we have in column 4 the numbers in 20-cent groups which show 
considerable, but not absolute regularity. The numbers in 
30-cent groups* are successively 75, 355, 674, 1,242, 740, 660, 
343, 310, 180, 181, 233, 32, 82, 3, 4, 8, I, almost completely 
regular except for the large group at $3.50. 

The question as to which of these groupings should be selected 
is to be decided by the number of separate items the eye can 
instantaneously grasp. In looking at the 25 numbers in the 
20-cent groups, or the 18 in the 30-cent, the meaning is lost in 
a maze of figures (though as many details as these could be 
properly shown in a diagram), but the 1 1 numbers in the half- 
dollar groups are easily comprehended. 

Stated in words, the. result of our tabulation (column 7) is 
that 6 per cent, of the wage-earners made from $.25 to $.74, 
29 per cent, from $.75 to $1.24, and so on. 

For the practical work of the tabulation from the original 
figures, we should take ruled sheets, enter at the head of successive 
Practical tabu- columns certain wage limits, and turning through 
lation. the items enter each wage by a dot in its appro- 

priate column, grouping them in fives and tens, to facilitate 
addition. 

From the preceding paragraphs it is clear that we do not 
need to take separate columns for each cent from $.25 to $5.35 
for tabulation, but a little consideration is necessary to see how 
minute the limits should be to give the correct average. 



* Vide p. 121 in/ra, 



TABULATION. 



89 



Suppose the entries in cent groups to be : — 



$1.70 


$1.71 


$1.72 


$1-73 


$1.74 


• • 
• 

• • 
• 


• • 
• 

• • 

• • 

• 

• • 

• • 


• 
• 
• 


• • 
• 

• • 

• • 
• 

• • 


• • 
• 

• • 



The average of the wages so entered can be quickly calculated 
as $1,718. 

If, on the other hand, we put all the 46 entries as simply "be- 
tween $1.70 and $1.74," or more exactly "as much as $1.70 but 
less than $1.75," we should naturally take them to be all (for 
purposes of averaging) at the middle point of this group, viz.,$i.72. 

If we have a sufficient number of items, the differences 
between the average assumed and that calculated for each group 
will be very slight. This is seen on p. 91 ; column 8 gives the 
averages calculated from the entries in lo-cent groups, while 
column 9 gives them on the hypothesis that for purposes of 
averaging the numbers in the half-dollar groups may all be taken 
at the middle points of their groups. The difference is greatest 
in the first and last, the smallest groups. The general average 
obtained from column 9 is $1.70, which is the nearest round 
number to the true average $1.73. Hence, for the purpose of 
obtaining the general grouping and average, we need only take 
1 1 half-dollar columns for marking in our items. 

For other purposes it may be advisable to work more minutely ; 
for in the lowest group, we shall wish to know how many are 
earning $.25, $.30, $.35 separately, for 5 cents is a perceptible 
difference on 25 cents. At the top also it may be useful to know 
the exact wages. 

More minute entries again will be needed for the second 

method of tabulation, which is as follows : — Suppose all the 

The Gaitonio wage-earners to be arranged in order of the magni- 

method. tude of their wages, those at $.25 at one end, those 
^t $575 at the other. Note the wages of men at given points in 
the row. The lowest wage is $.25 ; one-tenth of the way along, 



go ELEMENTS OF STATISTICS 

that of the 51 2th worker is between $.85 and $.95, • . . : half- 
way up the wage is $1.50. The figures at each tenth are given 
on p. 92. By this means we get a very vivid idea o. the distri- 
bution according to wages. 

These numbers cannot be obtained accurately if we have only 
entered the details correct to half-dollars, but can be found from 
the lo-cent grouping, which is therefore the classification to 
be adopted. We must first determine in which of the small 
groups the men one-tenth, two-tenths ... up the group lie, 
and then estimate their position inside the smaller group. 
Thus, if we want the figure more accurately than "between 
$.85 and $.95," as given above, we proceed as follows: — The 5 12th 
man from the bottom is the 82nd man in the group between 
$.85 and $.95, for there are 430 earning less than $.85 ; this group 
contains 169; if they were distributed regularly, 17 to each 
cent, the 82nd man would be half-way through this group, 
between $.89 and $.90. The hypothesis of even distribution is 
sufficiently correct for most purposes, and this method affords 
a sufficiently accurate means of determining the wage of the 
workers at the tenth places. The resulting figures are given on 
p. 92. If, however, we want to know the wage of the half-way 
man more exactly, we see from the half-dollar groups that it is 
between $1.25 and $1.75, a rough approximation shows it to lie 
probably between $1.45 and $1.55, and then we rapidly turn 
through our original data, isolating the wages at $1.46, $1.47, 

. . . $1.55.* 

A slight modification of this method is also useful. Take the 
average of the lowest 512 (or tenth), namely, $.70^ ; of the next, 
namely, $1.03 ; and so on (see p. 92). These figures also give a 
vivid view, and are very convenient for comparisons with other 
groups. 

The figures so far apply to only half of the data in the 
Senate Report. On p. 92 the whole are tabulated to give the 
average wages of the successive tenths. A comparison of the 
two groups so obtained shows how far the first half was typical 
of the whole. This method will be dealt with in a later chapter. 

* On this method see pages 127, 128. 



TABULATION. 



91 



Tabulation of Wages — American Figures, 189 i. 



I. 


3. 


3. 




4. 




5- 1 0. 


7. 


8. 9. 


Earning Daily 
Wages. 


No. of 
Persons. 


*/ 




No. of 
Persons. 






No. of 
Persons. 


Percent 
age. 


Average 
Wage m 
Group. 


$ 




$ 








$ 








as much and less 




as much and less 




as much and less 








as than 




as 


than 




as 


than 








.25 .35^ 
.35 .45 


.1) 


.25 


•45 


16 










$ instead $ 


.45 .55 - 
.55 .65 


59 1 
85/ 


.45 


.6s 


144 


•25 


.75 


317 


6.2 


.62 of .50 


.65 .7jJ 
•75 .85> 


157 1 
113/ 


.65 


■85 


270 










^ 


.85 .95 
.95 1.05 - 


169 \ 
201 J 


.85 


1.05 


370 


.75 


1.25 


1,472 


28.7 


1.09 I.OO 


1.05 1.15 
1. 15 1.25^ 


304 I 
685 j 


, 1-05 


1.25 


989 












1.25 1.351 
1.35 1.45 1 


99 I 
458/ 


1.25 


1.45 


557 












1-45 1-55 > 
1.55 1.65 


466 \ 

72/ 


1.45 


1.65 


538 


1.25 


1.75 


1,297 


25.3 


1.49 1.50 


1-65 1.75/ 
1.75 I.85A 


202 \ 
329/ 


1.65 


1.85 


531 












1.85 1.95 
1.95 2.05 " 


58 1 
273 1 


1.85 


2.05 


331 


1-75 


2.25 


970 


18.9 


1.99 2.00 


2.05 2.15 
2.15 2.25. 


45 \ 
265/ 


' 2.05 

1 


2.25 


310 












2.25 2.35 \ 
2.35 2.45 


33 I 

lOI J 


2.25 


2.45 


1.34 












2.45 2.55 > 
2.55 26.5 


196 \ 
13/ 


12.45 


2.65 


209 


2.25 


2.75 


506 


9.9 


2.53 2.50 


2.65 2.75/ 
2.75 2.85>| 


''1} 


2.65 


2.85 


165 












2.85 2.95 
2.95 3.05 ' 


'4 
129/ 


2.85 


3.05 


144 


2.75 


3.25 


198 


3.9 


3.04 3.00 


3.05 3.15 
3.15 3-25>' 


4?} 


305 


3.25 


52 

1 












3.25 3.35\ 
3-35 3-45 


12 1 
0/ 


3.25 


3-45 


12 












3.45 3-55 
3.55 365 


221 \ 

Si 


3-45 


3.65 


226 


:3-25 


3.75 


254 


5.0 


3.51 350 


3.65 3-75 
3.75 3-851 


16 1 

- 

II j 


3.65 


3.85 


27 












3.85 3-95 
3-95 4-05 ■ 




82/ 


3.85 


4.05 


82 


3-75 


4.25 


96 


1.9 


4. CO 4.00 


4.05 4.15 
4.15 4.25, 


01 

3/ 


4.05 


4.25 


3 












4.25 4-35>. 
4-35 4-45 


0/ 


425 


4.45 















4.45 4-55 
4.55 4.65 


31 
I / 


4-45 


4.65 


4 


4.25 


4.75 


4 





4.50 4.50 


4.65 4-75^ 
4.75 4.851 


t) 


4.65 


4.85 















4.85 4.95 
4.95 505 • 


'\ 


4.85 


5.05 


8 


4.75 


5.25 


8 


.2 


5.00 5.00 


5.05 5.15 
5.15 5-25^ 


1} 


5.05 


5.25 





4 


At 5.35 


I 




5.35 5.25 


5-25 5.35 
Totals • 


I 


5.25 


5.35 


I 




1/ «/«/ 








5.123 


5»i23 


5»i23 


100 


Average Wage 


$1,731 














Avera 


ge Wage $1.70 



ELEMENTS OF STATISTICS. 



W«ei or ■' Ttnlh " M« (ifcw-WX 


Lowest Wage 


. US 


Group 






- 1-39 




■ >-49 




■ ■ 1-75 








■ 136 




- 2.98 


W«B« 


- S-3S 







S>i>»ror 






Worksrs. 




$.70 


■79 


Second „ - 


;:S 




Third „ • 




Fourth „ - 






Fifth 




]M 




;:SI 


Seventh „ • 












2-59 


1.58 


HighsEt „ ■ 
General Averag 


3-5' 


3-55 


1-73' 


1.82 



The tabulation of the data collected for the Wage Census 
on such forms as that on p. 36, illustrates well some of the 
difficulties involved. The items given on the main part of the 
schedule are of this kind : — 



Spinner 



-Time: 6 



. Avenfr* Wage. 



56^ hours. 



Such returns are not perfectly definite, for if many are 
employed in the same occupation in a mill, it is possible that 
Tabnuiion In they will earn at different rates. Thus this entry 
tha waga MDou. ^f g ^j: I2S. might arise from either 6 men each 
earning 12s., or 2 at lOs., 2 at 12s., 2 at 14s. (average I2s.); 
or 4 at I2s., I at iSs., l at lis. ; or 5 at I2s. and i at l8s. — I2s. 
being the general rate, but not the average, in these last two 
alternatives. Since the purpose of the wage census was to 
give a comprehensive account of wages adapted for use in all 
investigations, it should show the numbers in all trades and 
subdivisions of employment by age, sex, and district, the average 
and general rate of pay for each group, and sufficient details to 
show the distribution about the average in each group, for a 
mere average may conceal exceptionally high or exceptionally 
low wages. 

On inquiry at the Labour Department as to whether the 
original information had been given in a more detailed form than 
the line above, or whether divergencies might be concealed, the 
author learnt that the subdivision of occupations had been carried 
to such an extent, that in practice, where there was any great 
variation in the wages of workers under one heading, that head- 



TABULATION. 93 

ing had been split up, so that each group was separately entered, 
or that several groups were distinguished under one heading ; and 
that when there was reason to believe from the light of other 
returns that this had not been done, supplementary inquiries 
were made on this point, so that the original data were detailed 
enough for any requisite fineness of tabulation. 

The problem then was to tabulate the answers from the 
various factories in a district, to show clearly and succinctly 
the distribution of wages in each subdivision and in the whole. 
It can hardly be said with confidence that the method adopted, 
of which a specimen is given on p. 94, is entirely satisfactory. 

To clear our ideas let us suppose that the details on which 
the line relating to throwsters (time) was based were as 
follows : — 

3 earning 14/ - "average minimum rate." 

15/ 

15/6 

16/ 

17/6 

18/ 

18/6 

19/ 

20/6 ) ^ . , , 

/ r 18 earnmg 20/11 on the average. 

The process adopted in the tabulation may be supposed to 
have been to separate from the whole group of returns a small 
variong methodi group of old men or inferior workers earning far 
possible. below the average, and enter them as a distinct 
minimum group, and to separate a small group of the most 
skilled workers and enter them as a maximum group. This 
is better than giving simply the highest and lowest of the 
individual wages, for either of these may be due to excep- 
tional circumstances, and may be quite a long way from that 
paid to any other person. The exact size of these extreme 
groups must be determined from inspection of the returns them- 
selves. After this has been done, the remaining wages may not 
be grouped close together; in the example taken they are 
scattered between 15s. and 19s. To give some clue as to this 
distribution the number earning within 10 per cent, of the 



**f 




6 




20 




10 




20 




8 




10 




10 




8 





68 within 10 per cent, of the average 
for all, which is 17/7. 



94 



ELEMENTS OF STATISTICS. 



O 
< 

o 



NO 

00 
00 

M 

1.1 

o 
u 
O 

(A 



O 





Ui 


t^ 






u 




• wrt 


N^ 




Q 


Pi 




in 

.9 


< 




»M 






J* 


Q 




> 


Z 




c 


< 




•»4 




(d 


S 




Pt^ 




0tf 


D 


Q 


H 


:3 

c 


CO" 


O 


c^ 


H 


<: 


S 


O 


(3h 


^ 


12; 


p 


c?5 


r/3 


iz: 


(U 


b* 


< 


.X3 

4>» 


< 


^ 




H 


^ 






hJ 


O 


N4 

S 






►^ 


a. 


^ 


CO 


a 


X 



en 

0) 

> 

O 






I 



H 
U 

H 

CO 



c 00 

1| 

G '^ 

« s 

O 

H 



l-s HI 






I 







tf) 



MS^ Si 

It" 



Hf 




^ 
u 

S 



NO 



2^ 



9 

o 



2^ : '^•^ : 

CI • • 



O O coo 



. O »i^ON ^ * 

b) fl M *•* 



oo 

00 



o 

o 

o 



U) 



. f>» 

0) •> 

S 3 

(A «j 

g P!^ 

vm 08 
O 

s 

1 




M 









a 

s 

o 
H 



CA 

O 

< 



^ : »ooo 



^ coo O ^ 
. 1^ 0\ PO O 



lovo »ooo t^ 



J CO CO ON t>»oo 



«>t fl IH IH IH M 



• • • • ■ 






rhO\ 
N O 



Ov 



OJ V Qi' O 1> 

S E £ £ jj 

• ^4 '^^ 'r* ••-• ,S 









c/^-S 



Is?. 



C 

go 



o 
o 






c 12 

.2 c 

CA ••I 

« a. 



V 

■•-• 
CA 

2 
H 

c 
£ 



00 



NO 

I 

i 






I 

s 

o 



TABULATION. 95 

average is stated ; this is probably the best way if only one 
column can be devoted to it, but lo per cent, is a wide limit 
to adopt. Another method would be to give the limits within 
which the wages of the lO per cent, of the earners above and 
lo per cent, below the average were contained : in this case i6s. 
and 1 8s. 

If, however, not more than 8 columns are to be devoted to 
each group, the following arrangement would give much more 
definite information, and it could have been made from the^atS:^— 
in hand, and would be well adapted for all the purposes -for 
which it would be required. 

Number employed - •• • 109 

General average - - - - 17/7 

Average of lowest tenth * 14/9 

Quartile *• - • - 16/ 

Mediant - . - . 18/ 

Quartile t .... igj 

Average of highest tenth * - - 21/2 

We are fortunately not dependent solely on the tabulation 
The general as given above, for wages in industries as a whole 

BTimmary. ^^e also tabulated on the following plan, which is 
in a form most useful for purposes of comparison (p. 96). 

The lines giving percentages are most useful. We can at a 
glance compare the levels of wages in different industries. Thus 
in the cotton manufacture the average wage is 2s. higher than in 
the woollen ; and in the cotton there is a large group of highly 
skilled workers earning from 30s. to 35s., while in the woollen 
nearly half are close to the average, earning between 20s. and 
25s. In the jute and linen manufactures the averages are nearly 
the same, but in the former a larger proportion are below the 
15s. limit. In the silk manufacture there is an aristocracy as in 
the cotton, but it is smaller and better paid, for 12 per cent, 
earn more than 35s. This table is a masterpiece of concentration 
and clearness. 

We will discuss next the tabulation of the figures relating 

♦ F/V/^p. 92. t F/V/^p. 124. 



ELEMENTS OF STATISTICS. 



ll 
■So 



is 

ll 



l^ 





;l ll 8||l p tl|l ll ll 


li 


= ?K ;S35 i i s-S '5 aj ;i 




§*5;:,«Ja1 aJsUss -" "- 




Is rI^SI 8.5k?s:,5>8£JE 




|lpf^-.i=5?e~J5|:^1 


f 


!5|4§:s|i«iat5?jr4 




l^ls-lJlit^^.JslS'sl'fr 


Ini 


r2 1" gjtjsi "??;*:; i ; 


p. 


" i ■: ■ : : 5,"5 ll i; 1 1 I I i i 




Number 

Number 
Percent 
Number 
Per ie«. 
Number 
Pereeul. 
Number 
Perce... 
Number 
Per cent. 
Number 
Per cent. 
Number 
Percent. 
Number 
Per cent 


„■•■■■• 

2 

• ■ s 

1 = ■» 3 



TABULATION. 



97 



to CHANGES in RATES of WAGES collected by the Labour 

Department. Specimens of the forms by which 

ohange of such information is obtained were given among 

wages returns, ^j^^g^ relating to Strikes (p. 57). Referring to 

them, it will be seen that the facts given are the occupations 
and numbers afifected, the dates from which the changes took 
place, and the wages and hours in a full week exclusive of 
overtime (a definition corresponding exactly to that used for the 
wage census) before and after the change. 



Extract' from Table showing the Changes in Rates of Wages and 
Hours of Labour of Ordinary Agricultural Labourers in Various 
Districts of the United Kingdom in 1894, so far as reported to the 
Board of Trade.* 



County and Union. 


Particulars of Changes in 
Summer Wages. (1894 com- 
pared with 1893.) 


Particulars of Changes in 
Winter Wages. (1894 com- 
pared with 1893.) 


No. of Male 

Agricultural 

Labourers, 

Farm Servants, 
Shepherds, 

Horsekeepers, 
Horsemen, 
Teamsters, 

Carters, in '91. 


Increase. 


Decrease. 


Increase. 


Decrease. 


Lincolnshire— 
Gainsborough - 
Louth - 
Spilsby - 

Norfolk — 
Aylsham - 
Docking - 
Flegg, East and 

West - 
Forehoe - 


• • • 

• • « 

• • • 

• • • 

• • • 

• • « 

• ■ • 


Per Week. 

• • • 

• • • 

• • • 

1/(12/ to 11/) 
6d.(i2/6toi2/) 

1/(12/ to 11/) 

• • • 


Per Week. 

• • • 

• • • 

i/(io/-ii/) 

• • • 

• • • 


Per Week. 

i/6(i5/toi3/6) 
i/6(i3/6toi2/) 
i/6(i3/6toi2/) 

• « • 

• t • 

1/(11/ to 10/) 
1/(11/ to 10/) 


2,466 

3*932 

3,288 

2,576 
2,487 

1,108 
1,448 



* From the second Annual Report on Changes of WageSy pp. 198-9 ; a 
little compressed. 



98 



ELEMENTS OF STATISTICS. 



Extracts from Table showing the Changes in Rates of Wages of 
Ordinary Agricultural Labourers in Various Districts of the United 
Kingdom in the Summer of 1895, so far as reported to the Board 
of Trade .♦ 





No. of Male 

Agricultural 

Labourersi Farm 

Servants 


Particulars of 
Changes in Sum- 
mer Wages (1895 


Weekly Rate of Wages 
in Summer. 




County and Union. 


Shepherds, 
Horsekeepers, 


compared with 
1894). 






















Horsemen, 
Teamsters, 


Decreases in 


1894. 


1895. 






Carters, in 1891. 


**•»»( Co* 
















Per Week. 


s. 


d. 


s. 


d. 




Durham — 
















Stockton* - 


437 


Decrease of 6d, 


17 


6 


17 







Teesdale 


669t 


Advance of 6d. 


17 


6 


18 







(Barnard Castle 
















Rural Dist.).* 
















Oxfordshire— 
















Headington • 
Henley 


1,118 


Decrease of is. 


12 





II 







i,587t 


Decrecue of is. 


12 


to 


II 


to 




(Hambleden Rural 






14 





'3 







Dist, Bucks). 
















Norfolk— 
















Flegg, East & West 


i,io2i 


Decrease of is. 


II 





10 







Forehoe 


1,448 


Decrease of is. 


II 





10 







Henstead 


1,504 


Decrease of IS. 


II 





10 







Mitford and Laun- 
















ditch 


3,622 


Decrease of is. 


II 





10 







Smailburgh - 


2,264:1: 


Decrease of is. 


II 





10 







Swaffham - 


1,942 


Decrease of is. 


II 





10 







Wayland 


1,535 


Decrease of is. 
Labourers with- 


II 





10 







Carnarvonshire— 




out food, ad- 


-19 





20 







Carnarvon - 


I,I24t J 


vance of IS. 












(Gwyrfai Rural 


J 


Labourers with 


> 










Dist.). 


\ 


food, advance 
of IS. 


-II 

J 





12 








* Agricultural labourers in this district are hired in March and April for a year 
certain, and the change noted applies to the whole year, and not to the summer only. 

t The number of agricultural labourers, &c , is for the Poor Law Union, but the 
change applies to the Rural District only. 

t This number is partly estimated. 



* From the third Annual Report on Changes of Wages^ pp. 118, 119, 121 
(typography adapted). 



TABULATION. 



99 



The adjoining tables give examples of the way in which the 

changes in agricultural wages were tabulated in the Second and 

uit fti ^^^^^ Report on Changes in Rates of Wages and 

wages: Change Hours of Labour. In the first table space is 

In tairaiation. ^^^gted by devoting separate columns to increases 

and decreases, with the intention of making the table distinct ; 

while it is not clear whether "Winter 1894" means the winter 

beginning in or that ending in that year. 

In the second table, which refers to summer wages only, the 
columns are rearranged ; and increases and decreases printed in 
the same column, the latter in italics. In the Fifth Report all 
the information is printed in a clearer way, thus : — 

Winter Wages.* 



District. 


Number. 


Weekly Rates. 


Increase or Decrease per 
Week in 1897. 


Tendring 


3»"3 


Jan. '96. 

s. d. 
10 


Jan. '97. 

s. d. 
II 


Increase. 

s. d, 
I 


Decrease. 



The tabulation is repeated for the summer. 

The weakness in these agricultural returns is in the numbers 

column. In the returns from other industries the numbers given 

The number ^^^ those actually affected, but in this case it is not 

affected. found possible to obtain this number correctly, and 
the number entered is that found under " agricultural labourers " 
in the 1891 census, which includes the various categories as given 
in the above table. When a change of wages takes place in a 
rural district, we may perhaps assume that it is likely to be 
general, though, if it was a reduction, it might not be made 
by the better employers ; and though the change will not 
take place in the same week throughout the district, there 
is not likely to be much variation in this respect. The 
change is generally made at the time that winter wages 
give place to summer, or summer to winter ; and a slight 
increase or decrease may take place by making the winter 
reduction or the summer advance later than usual. On the 
whole, little error will be introduced by assuming that the change 
stated affects all the adult agricultural labourers in the district, 

* From the fifth Annual Report on Changes of Wages, p. 145. 



TOO 



ELEMENTS OF STATISTICS. 



Lack of data. 



and it is quite probable that a proportional change* will take 
place in the wages of horsekeepers, shepherds, and others, 
though it may not in the case of boys, or old men who are 
earning less than the district rate. The question, " Approximate 
number of able-bodied labourers in parish?" is asked on the 
inquiry form, but as the answers are not used, it may be 
assumed that they are generally not given with sufficient 
exactness. 

The object of the whole tabulation is to show the change in 
the national weekly wages bill, but many details are lacking for 
the complete calculation. In the case of agricultural labourers, 
we need, in addition to these data, accurate statements of the 

change of additional earnings, special payments, 
and payment in kinds. In all cases we need a 
more complete account of the whole wage-bill as well as the 
change. For agricultural labourers the material has just been 
published by the Labour Department ; * every year it receives 
returns from most of the 600 unions as to wages at all seasons, 
whether there has been a change or not. 

The looseness in the returns as to numbers does not prevent 
our calculating the change in the county or country rates, for 

ohanges In the numbers in each district affected by the change 
oonnty rates, maybe expected to bear the same proportion to 
the numbers given in the census returns, as the number of agri- 
cultural labourers of the same class in the whole county or 
country does to the census number. 

The calculation for Durham in the above table for the 
changes in summer wages 1894-95 may be performed as 
follows : — 





Average before 
change. 


Change. 


Proportional 
number a£fected. 


Amount of change 
on wage-bill. 


Stockton - 
Teesdale - 


s. d, 
17 6 
17 6 


-6d. 
+ 6d. 


4 

7 


s. d. 
-2 

+ 3 6 



Total change in county, + is. 6d. 
Proportional number in county, 73. 



1/6 



Effect on county average, -^ — Jd. 

Here, for simplicity of calculation, the numbers affected are 

* On these points see Mr Wilson Fox's Report on Wages and Earnings 
of Agricultural Labourers^ 1900, p. 50, and pp. 1 11- 157. 



TABULATION. 



lOI 



taken to the nearest lOO, a process which is not likely to affect 
the average perceptibly.* This rough method is likely to give 
the result as accurately as the original data make possible. A 
similar process with suitable modifications can be applied to the 
changes tabulated for other industries. The summary of such 
returns for agriculture for all counties is as follows : — 

■ 

Comparison of the Net Effect of the Changes of Cash Wages 
per Week paid in the Years 1896 and 1895 "^ certain Districts 
in England and Wales, f 



District. 

1 


Wages in 1896 as compared 
WITH 1895. 


Wages in 1895 as compared 
WITH 1894. 


Total** 
Number. 


Net E£fect of Changes 

on Weekly Wages. 

Increase (+) and 

Decrease (-). 


Total** 
Number. 


Net E£fect of Changes 

on Weekly Wages. 

Increase (+) and 

Decrease (-). 


Total. 


Per Head. 


Total. 


Per Head. 


England— 
Northern Counties - 

Yorkshire, Lanca- 
shire, and Cheshire 

Eastern and Midland 
Counties 

Southern and Wes- 
tern Counties 

Wales 


5,662 

2,897 

69,869 

20,901 

••• 


£ 
-43 

-J-IOO 

+666 

-340 
•• • 


s, d, 
-0 1} 

+0 8J 

+ 2i 
-0 4 

••• 


3.766 

3,942 

89,576 

20,441 

2,165 


£ 
+44 

-126 
- 2,045 

-575 
+ 73 


d. 
+ 25 

-7i 

-Si 
-6i 

+8i 


Total - 


99,329 


+ 383 


+ I 


119,890 


- 2,629 


-51 



** The number given is the total of male agricultural labourers, farm servants, shepherds, horse- 
keepers, in 1891, in the Poor Law Unions in which the changes took place. 



* The corresponding calculations for Oxfordshire are :- 
12/ -1/ II 

13/ - 1/ 16 



-11/ 
-16/ 
-27/ 



Effect on county average, -U~ — - 2d. 

For Norfolk :— 

12/ -1/ 134 -134/ 

Effect on county average, " -31l = - 46. 

t From the fourth Annual Report on Changes of Wages^ p. xliv. 



I02 



ELEMENTS OF STATISTICS. 



The value of this table is not obvious. It seems of little 
importance to know how many persons were affected altogether ; 
oritidim of though it is of some value to learn from a previous 
sunmaxy tabto. ^j^ble that 58,578 persons received increases, and 
40,751 decreases in 1896. This total of persons affected is con- 
stantly given in these tables; if a person receives an increase of is. 
one month, and loses it the next, he is counted as 2, and his con- 
tribution to the next column (net effect of change) is zero. This 
-£43 may mean that 2,000 persons received a decrease of is. each, 
and the remaining 3,662 (same or different persons) an increase 
of 3|d. each, or any other figures which would give the same 
total. The change per head in the next column is unimportant ; 
it only shows an arithmetical quotient with no concrete meaning 
that can be expressed in words. If it was replaced by another 

quotient, viz., ^^, where n is the number of agricultural 

labourers in the Northern Counties, we should know the effect 
on average wages. In fact, the table would be more useful 
thus: — 

Approximate Effect of Changes on National Weekly 

Wage Bill. 



District. 


Increases. 


DsCRBilSBS. 


Net 
Change. 


Total No. 
Employed. 


Average 
Change. 


No. 
affected. 


Total. 


No. 
affected. 


Total. 














1 





The figures given supply an example of the common practice 
of carrying out into detail a calculation which depends originally 
on incorrect numbers, in this case the number employed, and is 
therefore misleading throughout. Till the average (useless here 
in any case) is taken, the error in this quantity has no injurious 
effect. As shown above, the average here given could be replaced 
by another which would be of use, and which would be correct 
within limits that could be defined, and would be narrow enough 
for most purposes. 

Further, since the column of numbers affected is admittedly 
wrong, the figures should be given to the nearest 1,000 rather 



TABULATION. J03 

than to units, even if no attempt was made to estimate the new 
figure ; "between 5,000 and 6,000 are affected" is a more useful 
and correct statement than "5,662 persons belonged in 1891 to 
a class in some undefined way connected with that in question 
in 1896." 

The discussion of Group C, the tabulation of non-numerical 
answers, must be postponed till we have analysed the nature and 
use of averages. 



^^\. 



CHAPTER V. 
AVERAGES. 



CHAPTER V, 

AVERAGES. 

It is natural, in a book with the present title, to allot a 
considerable space to averages. By the use of averages complex 
groups and large numbers are presented in a few significant 
words or figures ; and thus the two definitions of statistics, 
the Science of Averages and the Science of Large Numbers y are 
reconciled. 

Some writers have attempted to draw a distinction between 
averages and means, but no general agreement has been reached 
ATorageB and 2is to the exact senses in which the words are 
""*"•• to be separately applied.* The best distinction 
may be made by deciding that an average is a purely arithmetical 
conception, such as the average length of life in a varied popu- 
lation, which does not correspond to any particular group, but 
is only a short way of expressing an arithmetical result ; while 
the word " mean " is to be applied to some objective quantity, 
such as the mean height of Englishmen, about which all height- 
measurements are grouped according to a definite law, which 
will be discussed in the sequel.t 

A. Arithmetic Averages. — We may rapidly pass by 

some of the common uses of the word "average," and pick 
out those which will prove of use in statistics. An average is 
sometimes used merely to save big figures. The average weight 
of the University crew is given, only because it is more usual 
to speak of a man's weight being 12^ stone than of eight men's 
weight being 12 J cwt, and it is easier to connect the former 
with men's weight in general. Similarly, if we are comparing 
the value of the exportations of some commodity in two periods 

* Compare the article " Moyenne," by Dr Bertillon, in Dictionaire 
encyclopddique des Sciences Mddicales, with this chapter. See also the 
paper by Dr Venn in the Statistical J oumal, 1891, and chap, xviii. in his 
Logic of Chance, 

t See Part II., Section i, infra. 



I08 ELEMENTS OF STATISTICS. 

of ten years each, we should say that the yearly average in the 
period 1870-79 was ;^io,ooo,ooo, and in 1880-89 was £1 1,000,000, 
rather than that the totals were ;^ioo,ooo,ooo and £1 10,000,000. 
This leads to the second ordinary use of the word. If we 
The oommon were Comparing the ten years 1870-79 with the 
dmominator. eleven years 1880-90, and the totals in the periods 
were ;f 100,000,000 and ;^ 132,000,000 respectively, we should 
obtam no grasp of the difference till we had reduced them 
to a common denominator by dividing by the number of 
years, and found that the averages in the two periods were 
;£^io,ooo,ooo and ;£^i 2,000,000. This class of averages is well 
known in cricket ; sometimes the total number of runs made 
or wickets taken by each cricketer are stated also, but these 
are rather as so-called statistical curiosities than as having 
much bearing on the skill or luck of the players. The numbers 
by which the seasons* performances are judged are the quotients 
of the number of runs by the number of innings, of the number 
of wickets by the number of runs, and so on, all quantities 
being reduced to a common denominator. A consideration 
of the best methods of comparing cricketers or counties, and 
an exposure of the fallacies inherent in the present system, 
would afford a useful exercise in the use of averages and the 
choice of the most appropriate kind. The average in this 
sense is very common in mechanics. The average pressure 
per square inch, the average work done by an engine per 
minute, the average speed of a train, are quantities which it 
is frequently necessary to use. Such an expression as the 
average rate of interest is precisely similar. 

It will be clear that percentage is a special case of this 
use of average. It is useless when comparing the growths 
Averages as of population or of trade to give only the 
rates. whole numbers. An increase of 50,000 in the 
population of London is not so significant as one of 10,000 
in that of Harrow ; they must be expressed as increases 
of I per cent, and 150 per cent, say, before their meaning 
can be appreciated, and this is the same thing as giving the 
average increase to 100 inhabitants. For this reason the 
records of births, deaths, and marriages are always given 
as rates — so many per 1,000 inhabitants; and in these cases 
a double average is given, for the rates signify so many per 
1,000 inhabitants per annum. 



AVERAGES. IO9 

Another extension of the same use is found when quan- 
tities are reduced to rates ** per head " of the population. This 
use is solely for comparison, and the principle employed is 
that of the common denominator. It would be futile to state 
that the amount spent on drink was, say, ;£" 100,00c 000 in i860 
and ;£" 110,000,000 in 1890; but the corresponding statements 
that the amounts were ;^3. los. per head in i860 and £2, 15s. 
per head in 1890 would make a comparison possible. Or, to 
take a better instance : in studying the increments in the values 
of England's foreign trade, an entirely wrong view is obtained, 
unless we calculate for each year the value per head of the 
population,* instead of looking only at the totals. A neglect 
of this division would make municipal expenditure appear to 
be growing much faster than it really is ; and in preparing 
any comparative summary of figures, it is always necessary 
to consider whether such an average should be taken. 

Preiimiiiaiy So far, the averages considered are simply 

definition. arithmetical, and satisfy the following definition : — 

Average x number to which it applies^ total quantity dealt with, 
e,g. Average weight x number of crew = total weight of crew. 

Average value of imports per head of population x number of 
population = total value of imports, and so on. 

The following question, however, will lead us further. The 

itsinappiioa- average weekly agricultural wages in 1892 in 

biiity. Wilts, Dorset, Devon, Cornwall, and Somerset 

were los., los., 13s. 6d., 14s., lis. respectively. What was 

the average in the south-west of England? 

The simplest method is to say, the average was 

los. + los. + 13s. 6d. + 14s. + I IS. 58s. 6d. ^ * 

■ ^ =s I IS. 0.4-C1* 

S S 

and for many purposes this would be sufficient ; but it does 
not satisfy the above definition. For when we ask the double 
question "lis. 8.4d. multiplied by what number equals what 
total?", we can only answer that lis. 8.4d. multiplied by the 
number of items equals the sum of items. 

We must consider further what we understand by the ex- 
pressions "average wage in each county," and "average wage 
in the group of five counties." 



no ELEMENTS OF STATISTICS. 

It may be supposed that the average wage in Wilts, for 
instance, was compiled by getting returns from different villages, 
say I2S., IIS., 9s., 9s. 6d., ids. 6d., 9s., 9s., adding them and 
dividing by the number of villages. This of course satisfies 
our definition no better than the former. What is to be 
understood by the average in each village? If our present 
definition is to be satisfied, it should be the total of the wages 
paid in the village divided by the number of workers. It is 
hardly necessary to say that this total is never found in such 
an investigation, and the average is given from observation or 
by guess-work, not by calculation. 

If, however, the village average was correct, and we had 
returns from all the villages in the county, we should find 
the county average as follows: — 

I2/X20O-fIl/x 1504- 9/X3OO + 9/6X 150+10/6x400 4- 9/X200 + 9/x20O _.^ 8 

200+150+300+150+400+200+200 ^' ' ' 

where the numbers in the denominator are the numbers of 
labourers in the respective villages. We should then have the 
same result as if we had had the wages of all the labourers 
in the county put down on a sheet, added up, and divided by 
their number, and the average would satisfy the definition. 

It is clear that we can simplify this arithmetical work, 
for if we divide throughout by 50 we get the same result; 
this is as if we said there were 4, 3, 6 . . . labourers in the 
villages instead of 200, 150, .. . Thus we get the same 
result if we take numbers proportional to the total numbers 
of the labourers instead of the actual numbers. This plan 
has two advantages : first, that though we do not know the 
numbers of labourers, we know numbers nearly proportional 
to them, viz., those included in the census returns under the 
general headings relating to agriculture ; and secondly, we need 
not choose our numbers with absolute exactness ; thus the 
numbers of labourers above given may be supposed to be round 
numbers substituted for 213, 145, 320 . . . ; and it will presently 
be seen that such differences hardly affect the average. We 
idealize the village, and suppose it to contain round numbers ; 
and then for the numerical work take simple numbers pro- 
portional to these. This is important as simplifying numerical 
work. 

Averages obtained for the county in this way do not ab- 



AVERAGES. 1 1 1 

solutely satisfy our definition, but are very nearly equal to 
those that do. We can then proceed to take the average for 
the south-west of England on the same principles. 

B. Weighted Averages. — This discussion introduces and 
gives an example of the very important statistical method 
known as " weighting the average." We may illustrate it 
further from the same figures by considering what weights to 
apply to get this average for South- West England. We may find 

the number of agricultural labourers in the counties and work 

. . , .1 los. X 20,000 + los. X 30,000 + 

out the average thus : ^^^^ . ^J^ — ; ; or we 

^ 20,000 + 30,000 + ' 

may argue that since we have no means of knowing the 
exact numbers of labourers we may as well arrange the 
weights, according to the importance of the counties, say 
20,000, 30,000, &c., from some other point of view, and 
take numbers representing such quantities as the amounts 
of wheat produced, the area, or the rate of increase of 
population. In this particular case these methods would be 
absurd, but in other problems the weights are not so obvious. 
Suppose, for example, that we are considering the attraction 
of London on the inhabitants of various counties ; that we are 
told that so many immigrants arrive from Essex, Norfolk, and 
Suffolk, and so many from Stafford and Worcester, and we are 
asked to compare the attractive power on the agricultural and 
manufacturing counties. Should we weight the numbers given 
by the total numbers of inhabitants of the contributing counties, 
or by their distance from London, or by some quantity derived 
from these? 

A more practical problem, the classical and most useful 
application of weights, is the formation of an index number 

for the change of prices by fitting-suitable weights 
to the changes measured in the prices of various 
commodities. This will be considered separately,* but it is best 
to deal with the first principles here. It is required to find the 
change in the value of gold when measured by the prices of other 
commodities. Suppose that we are given that the prices of 
certain commodities between two years were in the following 
ratios : — 

* See tn/ra, Chap. IX 



112 



ELEMENTS OF STATISTICS. 





Wheat. 
100 

77 


Silver. 


Meat. 


1 
Sugar. Cotton. 

1 


First Year - 
Second Year 


lOO 

6o 


100 
90 


100 
40 


100 

85 



The simplest way to estimate for the general fall in price is 
to take the simple average of the numbers in the second year, 
viz., 70.4 ; and say that general prices in the second year were 
70.4 per cent, of those in the first, and the value of gold had 
increased in the ratio 70.4 : 100 when expressed in commodities. 
But it is at once clear that we cannot allow the commodities 
given to have equal influences on the result ; wheat is of greater 
importance than sugar and meat than silver ; and again we 
have taken arbitrarily three items to represent food and one for 
clothing ; we need some means of deciding relative importance. 
Suppose we decide that wheat, cotton, meat, and sugar are 
respectively 7» 4i 3 times and twice as important as silver, we 
should get the following table : — 



Commodity. 


Relative Price in 
Second Year. 


Weight Assigned. 


Product. 


Wheat - 

Silver .... 

Meat - - - - 

Sugar .... 

Cotton 


77 
60 

90 

40 

85 
352 


7 
I 

3 
2 

4 

17 


539 
60 

270 

80 

340 
1289 



1289 
Weighted average is —rr- = 75.8 

352 
Unweighted average •— • = 70.4 

This process is equivalent to writing down the price of wheat 
seven' times, silver once, meat thrice, &c., and then taking the 
simple average of these numbers. 

The idea is made clearer by the mechanical analogy in which 

the word weight originated. Suppose a uniform weightless rigid 

Meohanioai ^od graduated in 100 equal divisions, and equal 

iimstration. weights hung at the 77th, 60th, 90th, 40th, and 

8sth divisions from one end ; the rod will then balance at a 

point corresponding to the unweighted average, 70.4 intervals 



AV£RAG£S. 113 

from the same end. Now, suppose the equa. weights replaced 
by weights of 7, i, 3, 2, 4 lbs. respectively, and the rod will 
balance at a point corresponding to the weighted averages 
75.8 intervals from the same end. The further any particular 
mass is moved, or the heavier it is, the more the centre of 
gravity will be shifted ; and this clearly corresponds to the 
influence we should wish the various prices to have in the 

statistical problem. The formula in use in Statics, x = ~^, 

which corresponds to the arithmetic on the previous page, can 
also be used in Statistics. 

The discussion of the proper weights to be used in this and 
other averages has occupied a space in statistical literature out 
of all proportion to its significance, for it may be said at once 
that no great importance need be attached to the special 
choice of weights ; one of the most convenient facts of 
The small eflfeot Statistical theoiy is that, given certain condi- 

of weights, tions, the same result is obtained whatever logical 
system of weights is applied. We must postpone the mathe- 
matical analysis of this proposition, but may offer immediately 
some arithmetical illustrations. 

The table on the next page affords an example of this prin- 
ciple,* and is worth careful study. At the commencement of the 
Example ftom Wage Census, circulars were sent to all the principal 
the Wage Census, firms in all well-located trades, asking for details 
as to wages. Of these some were not returned, and the 
numbers allotted in the Final Report to each trade are not the 
numbers which actually belong to the trade in the whole 
country, but the numbers of those in the firms which made 
returns. The average wage given is not therefore the arithmetic 
average for these trades for the whole country corresponding 
to the definition given above for average, but the average of 
the average wages as returned in each trade weighted by the 
numbers for whom returns were made ; so that the average 
wage given for the whole group of trades might have proved to 
be different, if with the same average in each trade the returns had 
been complete. It is very unlikely, however, that there would 
have been any great difference. In the table several systems of 
weighting are used ; the first are the numbers in these returns, 
giving an average, 24s. 7d. ; the second are the numbers be- 

* From the Statistical Journal^ December 1897, with corrections. 

H 



ti4 



lELEMENTS OF STATISTICS. 



Examples of the Smallness of the Change Introduced by 
Difference in Systems of Weighting. 



From the Wage Census. 






Numbers 














Flinployed 

in Trade 

when 

known. 


Arbitrary 
System of 
Weights. 


Equal 
Wwghts. 


Trade. 


Average 
Wages 


Number 
Included 




(Men). 


in Returns 


Unit x,ooo 








J. 


d. 










Cotton Manufacture 


25 


3 


32,189 


142 


144 




Woollen n ... 


23 


2 


12,248 


54 


172 




Worsted and Stuff Manufacture - 


23 


4 


7,005 


38 


219 




Linen Manufacture . - • 


19 


9 


6,807 


22 


96 




Jute // ... 


19 


4 


2,799 


9 


H 




Hemp, &c., n - - . 


23 


6 


1,232 


3 


78 




SUk » ... 


22 


3 


2,248 


10 


189 




Carpet m ... 


26 


7 


1,292 





213 




Hosiery m ... 


24 


5 


1,070 


8 


287 




Lace It ... 


27 


3 


593 


8 


51 




Small wares n ... 


20 


2 


2,734 





225 




Flock and Shoddy Manuf.iciure - 


21 


2 


330 


2 


200 




Coal, Iron Ore, and Ironstone 














Mines 


22 


II 


67,429 


57 


142 




Metalliferous Mines 


16 


6 


5,046 





190 




Shale Mines and Paraffin Oil Works 


25 





3,021 





207 




Slate Mines and Quarries 


22 


I 


6,933 


\ 


232 




Granite Quarries and Works 


21 


II 


2,315 


- 12 


206 




Stone Quarries - - - . 


23 


10 


3*956 


J 


34 




China, Clay, &c.. Works 


18 


8 


499 





3) 




Police 


27 


7 


52,682 


58 


224 




Roads, Pavements, and Sewers ■ 


20 


9 


24,276 





29 




Gasworks 


27 


2 


27,965 





40 




Waterworks - . . . 


24 


9 


5,187 





151 




Pig Iron (Blast Furnaces) - 


24 


6 


6,234 





128 




General Engineering Iron and 














Brass Foundries and Machinery 














Trades 


25 


9 


41,658 


200 


173 




Shipbuilding, Iron and Steel 


29 


3 


10,661 


80 


228 




Tinplate Works .... 


33 


5 


11,514 





178 




Saw Mills 


24 


3 


2,088 





174 




Brass Works and Metal Wares 


29 


7 


1,838 





222 




Shipbuilding, Wood 


28 


4 


454 





79 




Cooperage Works 


30 


5 


327 





165 




Coach and Carriage Building 


26 


6 


1,664 





28 




Boot and Shoe Making 


24 


3 


2,902 





142 




Breweries 


24 


3 


8,366 





46 




Distilleries 


20 


4 


1,795 





129 




Brick and Tile, &c. , Making 


22 


10 


3,188 





55 




Chemical Manure Works 


23 





1,054 





210 




Railway Carriage and Wagon 














Building .... 


25 


2 


2,239 





233 


I 






J. d. 


s. d. 


s, d. 


s, d. 


Averages 


■ • i 




24 7 


25 3 


24 5i 


24 2 



AVERAGES. 1 1 S 

longing to each trade according to the census when they are above 
a certain minimum, giving an average 25s. 3d. ; the third is a 
purely arbitrary list of figures taken from a source which has no 
connection with wages, and the average is 24s. 5 Jd. ; the last is 
the unweighted average, that is, all the weights are equal, and the 
average is now 24s. 2d. These averages are close together, while 
the original items vary from 'i6s. 6d. to 30s. 5d. It is to be 
noticed that the true weights are not known in this case, but 
that owing to this principle we are able to dispense with them 
entirely. 

The problem dealt with in the next table is to find the aver- 
age weekly agricultural wage in England and Wales from the 

returns for Michaelmas 1869 and Lady Day 1870, 
swTOTagerunder given in columns I and 2. There are very many 
many systems different ways of taking this average, some of 
which are as follows : — ^Take the average of summer 
and autumn for each county, as in column 3, and then the un- 
weighted average of these 45 numbers ; this is 12s. 7d. Suppose 
the summer wage to be paid twice as long as the autumn wage, 
as in column 4, and proceed as before; the average is 12s. 5|d., 
the slight difference being due to the inclusion of harvest pay- 
ments in the Michaelmas wage, which makes them higher on the 
whole than the summer wages. Again, divide the counties into 
geographical groups, take the simple average for each group 
(the figures marked a in column 3 and b in column 4) and 
weight these by the figures marked c in column 5, the numbers 
of agricultural labourers in each group ; the average of the a 
figures with the c weights is 12s. 5d., of the b figures with the c 
weights is 1 2s. 4d. Again, weight the figures for each county 
in column 4 with the numbers in column 5, the most obvious 
method of all; the average is then 12s. 4d. Again, take the 
simple average of the district averages a and ^, that is, give each 
of the eight districts equal weights; the averages are 12s. 4fd. 
and I2S. 3jd. Or take the simple average of column 3, counting 
Yorkshire and Wales each as one county ; it is 12s. 8d. 

To obtain new groups, take as weights not the number of 
agricultural labourers, but the total population of the districts, 
the numbers marked d. Exclude the population of London as 
exerting a preponderating influence unconnected with agriculture. 
A new factor is now introduced, for population is greatest in the 
manufacturing districts, where agricultural labour is of compara- 



4ik 


: ■ ■ i i ; 


fr : : : : : 


5- 


ll:-:li 


^ ■■::!: 


= 


M 




?a23;?8 


g. ..^o. 




fpi 


., O " - T'O « 


r. ♦««" = 


" 


«0'o'Sc.«oo.|« *«-S-r. ooo 


■^ 


"'"""" 


- --"-" 


^ U 


^M 


•i" n'?nn<^ 


»«^*<nc 


■ScoT^rr^OO-S 


.^.-..oo = 


■^ 


""""""Ig ""-"- 


- -------- 


g"""" """ 


' m 


MOO.ot,oo 


■ ra^s-i 


■<> O nt! O'O Ot! 


o.* >o o « o o o 


■: 


' III 


SO<^no«o 


^■7=7^. 


■ SS2'Er'2 2'S"S 


^n==^=^; 










Stafford - 
Gloucester 
Hereford 
Salop - 
Wor^cer 
Warwick 


Average 

Rutland - 
Lincoln - 
Notts 
Derby . 

Average 
Cheshire - 
Lanes. - 
Yorks, W. 
Yorks, N. 
Durham - 


Average 
Monmouth 

Caer mar then 

Pembroke 

Cardigan 

Brecknock 

Radnor - 

Carnarvon 


1 


* 


Ihl 


nn. 


, ..M. ::M= n.M.I 




s^jag ^ 8s>=^&? 1 s s,? 


a 'StStJ?, 


2 


'hpi 


■^H« flCOB 


B-nl" 


2 ns:: 


S 222=2 


a 


'M 


■^••°—"^° 






S 2<^2 = 2 


s 


' m 


■^ocoao 


g.coo 




■: TbZI 


i 




, 


ill 


^«o«oo 


•^■^°° 


° 


. ::: 


O^JOOO 




1 



AVERAGES. II7 

tively little importance, but receives high wages ; these high 
wages have undue weight, and the average of the figures b with 
weights a is brought up to 13s. ifd. The "median" of the 
county averages in column 4 is 12s. id. If column 4 is rewritten 
correct only to the nearest is., and column 5 to the nearest 
10,000, the weighted average is 12s. 5d. If column 3 is 
weighted with random numbers quite unconnected with the 
problem, viz., the successive digits in the third decimal places 
of the logarithms of the numbers 2 to 46, the average is 12s. 
lofd. The reader may try any other system of logical or 
absurd weights, and he will find that unless there is some bias 
in the selection of weights, or great preponderance is given to a 
few counties, that the average will be little affected. 

Since the true system of weights which would reduce the 
general average to our definition must be allied to some of those 
here adopted, and can hardly show greater divergence from 
I2s. 4d. than these do, we may feel confident that the true 
average is within, say, 3d. of this figure. The original items 
varied from 8s. 6d. to 19s. ; the averages, even those based 
on the most extravagant methods, are contained by the limits 
I2S. and 13s. ifd. Without some such argument as this we 
should have no clue to the magnitude of the error introduced by 
erroneous weights. It is never safe, however, to assume that 
weights can be neglected, and an unweighted average used, 
without first examining the group in question, trying various 
Weights cannot systems, and seeing that the resulting average is 
in general b« stable. This will Only be the case if there is no 

neglected, bnt , , . r t ... 

only the errors in connection between the size of the quantities and 
their estimation, ^.j^^ ^^^^ magnitude of the weights. Thus if we are 

dealing with wages in towns, and are calculating the average for 
all towns taken together, we shall obtain too small a result if we 
ignore weights and count all towns as equal, for the higher wages 
are paid in the larger towns. Thus, as on pp. 134-5 below, the 
average of the recognised wages of 1 17 branches of the Amalga- 
mated Society of Engineers was 32s. 4d. in 1891 if we count all 
the branches as equal ; but was up to 33s. 4d. if we weight the 
wage at each of the branches with the number of members 
belonging to it. But, though we cannot neglect weights entirely 
in such cases, we need to make only a very rough estimation for 
them if there is no preponderating influence exerted by a small 
minority of places. In this case London, with a wage higher 
than any other district, except Dartford and Enfield Lock, and 



Il8 ELEMENTS OF STATISTICS. 

with nearly one-sixth of the total number of members dealt with, 
exerts such an influence. If, giving London its due importance, 
we take as weights the numbers belonging to the branches to the 
nearest hundred, we obtain the average 33s. 6d., practically the 
same as before. Each group for which an average is to be 
calculated must be treated on its merits ; in many cases the 
weights may be neglected entirely ; in nearly all cases, where the 
group consists of many items, even moderately large errors in 
computing weights may be neglected. Examination of the data 
will generally determine the importance of such errors. 

This principle is of great importance. In many cases the 
true weights are incalculable or even undefinable ; but now it is 
seen that, given certain conditions, there is no need to calculate 
or define the weights ; in many other cases the weights cannot 
be known exactly, but exactness is not necessary. No system 
of weights, however, can remove an original bias common to all 
the items. If, for example, wages throughout were is. less than 
here reckoned, the calculated average would be is. too high. So 
we arrive at a very important precept: in calculating averages 
give all care to making the items free from bias^ and do not strain 
after exactness in weighting. 

C. The Mode. — We pass to the consideration of two other 
means in common use among statisticians but unfortunately not 
yet consciously introduced into common parlance. There are, 
however, some popular phrases which, if they have any definite 
meaning, very nearly resemble the averages in question. When 
The ayerage we hear of the average clerk, the average working- 
"*^ man, the phrases admit many interpretations. In 
some way these persons are supposed to be types of their kind. 
The average clerk may be supposed to mean the one who 
receives the average income of all clerks, whose expenditure on 
necessaries and on luxuries is the average of all of his class, who 
takes the average amount of interest in his work, is of average 
ability and average age. It will be seen that this clerk is ideal, 
and not to be found in any random assembly of half-a-dozen ; 
for each of these will have some peculiarity, some quality in 
which he differs from the average ; the average man of the news- 
papers does not exist in the flesh, but is an imaginary person to 
whom certain attributes are attached. 

Quetelet's average man is familiar ; * he is of average height, 

* See Quetelet's Physique Sociale; and Edgeworth in Statistical Journal^ 
December 1893. 



AVERAGES. 



"9 



The mode. 



weight, Strength, girth and lung capacity, with eyes of normal 

Qneteiet'8 range and medium tint; but he is a more satis- 
average man. factory model than the newspapers* average, for in 
regarding him we see the type from which all other men may be 
supposed to have deviated ; the creature that would have been 
produced if all disturbing causes were removed. That any actual 
person should answer exactly to all these standards is of course 
in the highest degree improbable. 

Quetelet refers neither to the arithmetic average, the median 
nor mode, but to a mean about which all the similar measure- 
ments are grouped in accordance with a definite law, the obedience 
of anthropometrical measurements to which was his chief theme. 

The newspaper average, on the other hand, seems to be the 
mode, the position of the greatest density, which may be ex- 
plained as follows : — Referring back to the table of 
American wages, p. 91, or the table on next page, 
it will be noticed that in looking down column 2 we find the 
numbers increase till we come to 685 (between $1.15 and $1.24), 
and then after fluctuations diminish. This number, 685, is the 
greatest which occurs in any lO-cent group ; and its position is 
called the mode, or the position oi greatest density^ or the position 
of the maximum ordinate^ or the rate is spoken of as predominant. 

In this column 2 we have, however, 14 maxima in the 
correct sense of the word, the numbers rise and fall with little 

Method of regularity, and there are 14 modes of which that at 
determining $i.i5-$i.24 is the most pronounced. But if the 

the mode. groups are made wider, and the numbers entered 
as in column 6 in half-dollar limits, there are only three modes, 
or if we neglect the small group of 8 at $5.00 only two. The 
position of the largest group of 1,472 is not at once assignable 
more closely than as between .75 and 1.25. We can get a little 
closer by the following method : — 

Numbers earning as much as $0.65, and not as much as $1 

0.75 •• •• I 









0.85 

0-95 
1.05 

1.35 
1-45 
1-55 



99 



') 
»} 

M 

>> 
99 



15 

25 

35 
45 
55 
65 
75 
85 
95 
05 



944 

1,472 

1,458 

1,747 
2,012 

1,780 

1,297 

1,527 
1,127 

934 



I20 



ELEMENTS OF STATISTICS. 



Determination of the Mode. 
Numbers of Wage- Earners from the Senate Report y 1893, U,S.A, 



Ik ao-CsNT 
Groups. 



In 30-CENT 
Groups. 



From 



n 
n 

m 
m 

m 
It 

H 

n 

m 

It 
It 
It 

m 

n 

» 

It 

m 
m 

H 

n 
n 
n 
It 

M 

n 
n 

n 

f 

H 
H 

n 

H 
It 

H 

H 
H 

n 
n 
It 

n 
m 
It 
It 
n 
It 
» 
» 
ft 
It 



n 



$.25 to 

•35 " 
.45 
.55 
.65 

.75 
.85 

.95 
1.05 

1.15 
1.25 

1.35 
1.45 

1.55 
1.65 

1.75 // 

1.85 n 
1.95 • 
2.05 n 
2.15 n 
2.25 // 
2.35 ,/ 
2.45 It 

2.55 " 
2.65 t. 

2.75 
2.85 

2.95 

3.05 n 

3.15 • 
3.25 « 

3.35 " 

3-45 " 

3.55 " 
3.65 It 

3.75 " 

3.85 " 

3.95 

4.05 

4.15 

4-25 

4.35 

4.45 

4.55 
4.65 

4-75 
4.85 

4.95 
505 
5-15 
5-25 



.34 

.44 

.54 
.64 

.74 
.84 

.94 
1.04 

1. 14 

1.24 

1.34 
1.44 

1.54 
1.64 

1.74 
1.84 
1.94 
2.04 
2.14 
2.24 

2.34 
2.44 

2.54 

2.64 

2.74 
2.84 
2.94 

3.04 
3.14 
3.24 
3.34 
" 3-44 
" 3-54 

n 3.64 

" 3-74 

n 3.84 

" 3.94 
n 4.04 

• 4-14 
» 4.24 

• 4-34 

" 4.44 

4-54 
4.64 

4.74 
4.84 
4.94 

5-04 

5.14 
5.24 

5-34 



// 



// 



I 

15 

59 

85 

157 

"3 
169 

201 
304 
<»5 
99 
458 
466 

72 

202 

329 

58 

273 

45 
265 

33 
loi 

196 

13 

163 

2 

15 
129 

5 

47 
12 

o 

221 

5 
16 

II 
o 

82 
o 

3 
o 



// 



// 



n 



O 

3 
I 

o 

o 

8) 

^1 
oj 

I 




In 50-CENT 
Groups. 



317 



1,472 



1,297 



970 



506 ^ 



198 



254 



96 



8 



725 



2,012 



934 



- 640 



322 



285 



114 



AVERAGES. 121 

Now the greatest number is in the group $1.05-$!. 55, and 
the "mode" may be stated as near the middle point of the 
group, viz., $1.30, not at this point, for there are only 99 wage- 
earners in the group $1.2 5 -$1.34. 

Another method of approximating to the mode may be 
illustrated as follows : — When the numbers are tabulated in 
lO-cent groups, as on p. 91, the mode is quite indeterminate; 
in 20-cent groups the successive numbers beginning at .25-.44 
are 16, 144, 270, 370, 989, 557, 538, 531, &c., and the number 
989 (in the group $1.05-$!. 24) is a distinct mode ; if we begin 
the 20-cent groups at .3S-.S4, the numbers are 74, 242, 282, 505, 
784, 924, 274, &c., and 924 (in the group $1.3 5 -$1.54) is a mode ; 
by this double tabulation it is seen that the 20-cent grouping 
does not decide the mode. In 30-cent groups we have 355, 674, 
1,242 ($i.iS-$i.44), 740, &c., if we begin with $.55-$.84; we 
have 439, 1,190 ($.9S-$i.24), 1,023, &c., if we begin with $.65 -$.94 ; 
and 483, 1,088 ($i.o5-$i.34), 996, &c., if we begin with $.75- 
$1.04: the modes by each of these groupings lies in a group 
which contains $1.15 to $1.24, and this smaller- group maybe 
assumed to contain the mode, which is thus at or near $1.20. 
The example here taken is drawn from a group of very irregular 
figures, which specially illustrate the difficulties. The method 
just adopted may be summarised thus : — Tabulate the figures 
again and again in gradually widening groups till regularity is 
obtained ; then examine again the groups which have the selected 
width and see if the mode is shifted when the lower limit of the 
grouping * is moved ; if it is shifted the groups are not wide 
enough ; if it is not, the mode is in the smallest group common 
to the larger equal groups which all contain it. A more accurate 
diagrammatic method is described on p. 154. 

Even when our numbers are initially regular, it is seldom 

indefiniteness ^^^^ ^^ determine the mode exactly. The diffi- 
of the position culty is best seen by an example. Suppose that 
* "* ®' we have the following returns as to heights of 
a large number of men : — 

67 in. - - 455 

67i » - - 475 

67I „ - - 490 

67I „ ^ . 500 

68 „ . - 485 
68i „ - - 467 
68j „ - - 445 



97 



l> 



122 ELEMENTS OF STATISTICS. 

At first sight the mode appears to be at 67i in. exactly ; but it 
must be remembered that even in accurate measurements all 
heights within | in. of 6^^ in. will be entered as 67f if the 
measurements are taken to the nearest quarter inch, or will have 
been tabulated in this way if the measurements were more accu- 
rate. Hence 67! in. in reality stands for from 67f to 67I in. 
If the SCO heights so entered were distributed uniformly through 
this interval, the mode might be given with 67^ in. with fair 
accuracy; but there are signs in the figures that the mode is 
below this. Suppose that the figures in reality come from the 
following measurements : — 

From 67i to 67I in. 238 | g ^^ g , j^ 

,. 67^ » 67I » 245 \ 4gs at 67# 
„ 67I „ 67f „ 250 / ^95 7f 

" eJ-ll^" :!:}493at67i 
»> o7f » 08 „ 243 J 

„ 68 „ 68J „ 242 

and that these had been tabulated as in the last column, the 
mode would appear as 67^ in. ; while the same figures tabulated 
as before gave it as 67f in. The probability of some such 
shifting is seen from the original grouping, where the number at 
67^ in. is greater than that at 68 in. From this discussion we 
may see that the mode is always a little indefinite, depending on 
the width of the groups in which the items are tabulated, and on 
the exact position of the limits of the groups. As the items we 
deal with become more numerous, we shall find regularity when 
they are tabulated in narrower groups, and the mode can be 
assigned with greater accuracy. A more satisfactory method of 
determining the mode is that given on p. 155. 

Now is the "average workman" the man who earns $1.73 
per diem, the simple average of the whole group on p. 120, or a 
The "average ^^in making $i.20 the mode? In ordinary speech 
^^^" the latter is meant. The "average clerk" is not 
the one whose measurable qualities are an arithmetic mean of 
all similar qualities, but one whose qualities are found in the 
same degree in the greatest number of his fellows. There are 
more clerks who read the evening paper than who read Homer, 
more who go to music-halls than to oratorios, more whose 



AVERAGES. 1 23 

incomes are ;^ioo than £soOy more who live four miles from 
the City than one or twenty. Even with this explanation the 
average man is not a real creature, for fortunately no individual 
has no qualities out of the common. The fact that the average 
is a pure abstraction is of importance directly we apply statistics 
to actual affairs ; these American workpeople cannot be legislated 
for in the mass as if they all earned $1.20, or as if those who 
were alike in this did not differ in other respects, even doing very 
varying quantities of work for this wage. No single measure- 
ment expresses completely even the economic condition of a 

Importance of g^oup of workmen, but if we are taking a single 
the mode. measurement, that of the " mode " is often the 
most useful. It is at the mode that we find the greatest number 
of whose greatest good we may be thinking. Whereas the 
arithmetic mean and the " median " may correspond to no reality 
but be merely numerical conceptions, the mode is precisely that 
number for which most instances can be found. It shows the 
commonest result, that most often obtained, and is of very 
general application. For an intending passenger by train or 'bus, 
it is more important to know the most ordinary than to know the 
average number in a compartment. The mode rather than the 
average in chest measurements is the number most suitable for 
the ready-made clothier. For providing a post-office or a store, 
the mode in postal orders or prices of tea needs to be known 
rather than any other average. Even the favourite coin in a 
collection may show the spirit of the congregation better than 
the arithmetic average of their contributions. In these last 
instances it may be noticed that the mode is quite definite. 

A special feature of the mode is that it is entirely uninfluenced 
by extremes. A cheque for ;^ 1,000 in a collection disturbs the 

Advantages of arithmetic average, but not the mode. The incomes 
the mode. of a small number of millionaires and an army of 
paupers may have the same arithmetic average as a nation com- 
posed entirely of people moderately well off; but the modes will 
be very different in the two cases. In considering the change 
year by year in a group of figures, as for instance, the wages of a 
large group of workmen, we cannot tell, if we take the arithmetic 
average as our criterion, whether an improvement is due to a 
levelling up of the badly paid or a rapid increase for those who 
were already well off, while the mode will show the changing 
position of the main body. Mr Booth's " London *' is crowded 



124 ELEMENTS OF STATISTICS. 

with instances of the use of the mode. Each age diagram shows 
the mode in ages for an occupation; each wage list that in 
wages. His whole description of Class E, the typical workmen 
of modern towns, is based on the same principle. His measure- 
ment of social status, based on the number of rooms occupied or 
servants employed, can be used easily for stating the mode (four 
rooms to a family and no servant) but not any other average. 

An objection to this average is that there are many groups 
of figures to which it is not applicable. If we have a very irre- 
Bhortoomings of gular group of numbers with no particular type, 
the mode. guch as the populations of towns in England, 
the mode would be quite indefinite, and would give no informa- 
tion of importance. The use of the mode is to indicate the type 
from which other figures may be regarded as diverging. Thus, 
in these wage figures, the type is about $1.20, and other examples 
lie on either side, wages of men who have for some reason or 
other more or less than the normal degree of skill or opportunity. 
If there is a type, as in Quetelet's instances, the mode will show 
it. The mode only tells us one fact, however, about each type, 
whereas the methods already given (p. 92) show us several. 

D. The Median. — When we are dealing with a group of 
persons or things, each of which possesses some measurable 
attribute, such as height or wage, we can choose certain quanti- 
ties which describe the group in brief Suppose all the items 
arranged in a series in ascending order of the magnitude of this 
attribute; the magnitude appertaining to the item. half way up 
the series is called the median.* Thus if in a group of wage- 
earners 200 earn less than 20s. 3d., one earns 20s. 3d., and 200 
more, 20s. 3d. is the median wage. There are as many items 
below 20s. 3d. in the supposed series as above it The magni- 
tudes one-quarter and three-quarters up the series are called the 
quartiles; * those one, two . . . nine-tenths up are the deciles; 
those one, two . . . ninety-nine hundredths up are the per- 
centiles. The median is more definite in position than the mode. 
When we are dealing with exact measurements, if we have an 
odd number of items it is the middle one, if an even number, it 
lies between the two middle items, which are in general near 
together, or coincides with them if they are equal. If the magni- 
tudes are not given exactly, but as within small limits, we can by 
the method described on pp. 127-8 make a good estimate of the 
actual values of these averages. The median is not affected by 

* These quantities have already been used in tabulation, p. 95. 



\ 



AVERAGES. 1Z$ 

exceptional entries at all ; the existence of any number of 
millionaires has no more effect on the median income than of an 
equal number of any other persons whose incomes are above the 
median. For many purposes it is of course necessary to allow 
these extreme instances more weight than those which are nearer 
the average ; but the arithmetic average often gives them undue 
weight for this democratic age, since a single millionaire can 
counterbalance thousands of ordinary working men. A further 
advantage is that it is extremely simple to find, not needing 
much arithmetical work, for we need not do more than count 
those well above and well below the average, and look more 
carefully at those near it. 

There is a yet more important advantage in the use of the 

median ; it can often be found exactly, when our information as 

No need for *^ ^^^ items in question is neither accurate nor 

complete infor- complete. This will be clear from one or two 

"***°^ examples. It maybe that in the "wage census" 
100,000 persons, whose wages were far below the average, 
did not come into the returns at all, and it is very difficult 
to estimate their effect on the arithmetic average, for want 
of information as to their earnings ; but to find the median 
exactly, we need only know their number, not their earnings ; 
and if we can only assign a maximum for their number, we still 
can place the median within narrow limits. The addition of 
100,000 men with wages below 15s. to the general summary for 
the 356,000 men, would still leave the median in the group 
20s. to 253. where it already is ; the change would be very 
marked, however, in the lower deciles and quartiles, and the 
arithmetic average would be lowered by at least 2s. id. The 
same argument applies to incomes ; information is often very 
deficient, but it is in many cases possible to assert that a number 
of men, whose exact income is unknown, receive above a certain 
assigned sum, or even between two assigned limits, which is all 
we need to know about them to determine the median, if it lies 
below the lower limit. 

Again, in tracing the history of wages throughout the century 
it is often very difficult to find the correct average, but at the 
same time it is frequently possible to say that a very large class 
of men earned below, say, 15 s. a week, and another very large 
class above 30s. whose wages we do not exactly know, and a 
more definite number between 15s. and 20s., and 25s. and 30s. ; 



126 ELEMENTS OF STATISTICS. 

and in order to find the median all we need to do is to investi- 
gate more exactly the wages between 20s. and 255., and even if 
we have not complete information here, we can still say that the 
median certainly lies between certain narrow limits. There is 
yet another advantage, perhaps more important, that the median 
iBooiiuiMniiir. is applicable to quantities which are not capable 
au« qua&titiei. of measurement at all. This development is especi- 
ally due to Mr F. Galton.* Suppose it to be required, for example, 
to find among a large class of boys the average in intelligence. 
It is clear that it is not easy to find the arithmetic average of a 
quantity which cannot be properly measured even by the most 
elaborate system of marks, but on the other hand it would not 
be at all difficult with a class of, say, twenty boys, to place them 
in order of intelligence without committing oneself to such a 
statement as that A.'s cleverness was 25 per cent, more than 
B.*s; and the tenth or eleventh boy in this arrangement 
would show the style of boys in the class, at least as well 
as any other average. The disadvantage of this method, the 
reason why it is not universally applicable, is that the median 

of a series of observations may be totally removed 
from its type, and in fact may not be situated near 
any of the different objects which are observed. Thus, if we 
had two large groups of wages of a thousand men between 1 5s. 
and 255., and another thousand between 355. and 45s., the median 
would give us any position between 2Ss. and 3Ss., where as a 
matter of fact not a single wage-earner would be found. The 
median is then chiefly useful when we are dealing with a series 
of objects of which the main part lie fairly close together ; a few 
extremes do not affect itf 

If m is the media n and a the arithmetic average of n quantities Xi^ Xq . . . 
Xny and we call Xi - z, X2 —z . . . the deviations of the x^s from any quantity jsr, 
then m is the value of z which makes the sum of the deviations (all taken 
positively) a minimum, a is the value which makes the sum of the squares a 
minimum. The first statement is easily verified ; for the second, we notice 
that 2;r=««, and that Hx--zy^ltX^-nc^+n{a-z)\ which is a minimum 
when j8r=«. 

The following table shows the description of 76 items by the 
help of the various averages now described : — 

♦ See, for instance. Natural Inheritance^ p. 47. 

t On the relative advantages of this, and a more mathematical method, 
see Yule and Galton in the Statistical Journal for 1896, especially pp. 

392-398. 



AVERAGES. 



127 





Measurements of Boys of 


Ages 1 


[3 TO I 


5 Years. 


No. 


• 

Age. 


Height. 


Weight. 


No. 


Age. 


Height. 


Weight. 


Tabulation of 
Weights. 


yrs. mth. 


ft. in. 


t lb. 


yrs. mth. 


ft. in 


St. lb. 




I 


14. 1 


4. II J 


6.oi 


39 


14.7 


4. 1 14 


6.34 


Arithmetic aver- 


2 


14.9 


4.10 


5-7 


40 


13.1 


4. II J 


5-7 


age, 6 St. 14 lbs. 


3 


14.7 


5.5i 


7.5 


41 


14.3 


4. 1 1 


6.4i 




4 


13- " 


5.0 


6.3i 


42 


13.3 


4.44 


4.114 


The same, when 


5 


14. II 


531 


8.0} 


43 


14.3 


5.3 


6.74 


weights are en- 


6 


14.7 


4.10 


50 


44 


13.6 


5- 14 


!•'!* 


tered only to 


7 


14-3 


4.10 


6.7 


45 


14.2 


4.8} 


6.0} 


nearest stone, 


S 


14.9 


5-5 


8.5i 


46 


13.5 


5.2 


7.4 


6 St. i^ lbs. 


9 


14. II 


4.9i 


S.I2i 


47 


13.8 


5.24 


6. 1 1 




10 


14-3 


4. II J 


6. 1 If 


48 


14.6 


5-4 


7.44 


Median, 6 stones 


II 


13.4 


4-7 


5- it 


49 


14.8 


5.14 


6.10 


lilbs. 


12 


14.7 


5-3i 


7.8J 


50 


13.3 


4.8i 


5.0 




13 


13-8 


4.7i 


5.3 


51 


13.0 


5.ii, 


6.7 


Quartiles, 6 st. 9} 


14 


14.5 


5.2i 


7.81 


52 


13.10 


4. 1 14 


7.3J 


lbs., 5 St 64 lbs. 


IS 


14.4 


5.0 


6.0 


53 


14.8 


4. 1 14 


6.9i 




16 


13.6 


4-9 


5.6 


54 


13.8 


4-51 


4.94 


Average of quar- 


17 


14.0 


S.2i 


7.74 


55 


14.8 


5.44 


I'^r 


tiles, 6 St. I lb. 


18 


13.0 


4.8i 


5-3 


56 


14.0 


4.10 


6.24 




19 


14.7 


4. 1 1 


6.12J 


57 


13.10 


4.9 


5.5 


Half of the ex- 


20 


14.10 


51 


6.9 


58 


13.2 


5.04 


6.4 


amples lie within 


21 


13.9 


4. 1 1 


5.11 


59 


13.6 


4.7 


5-24 


9 lbs. of median. 


22 


14.10 


4.8i 


5-" 


60 


13.0 


4.9 


5.9f 




23 


13-4 


4.9i 


5.8J 


61 


13.3 


4.8i 


5.5i 


Mode is between 


24 


13- 1 


5-2i 


6.1 


62 


13.5 


4.84 


6. 51 


6 si. and 6^ st. 


25 


14.0 


4.6i 


5.64 


^J 


13.10 


5.54 


7.104 




26 


14.6 


5.34 


7.64 


64 


13.1 


4.81 


6. 2 J 


Average weight 


27 


14-3 


5-oi 


S.iif 


65 


13.10 


5.4 


7.2 


between ages 13 


28 


139 


4.9 


5.11 


66 


14.0 


4.9 


5.04 


and 134 years, 


29 


13-4 


5.ii 


5-9 


67 


13.3 


4.7 


5.0 


5 St. 9j lbs. ; 134 


30 


14.4 


5-1 


6.8i 


68 


13.8 


4. 1 1 


6.ii 


and 14 years, 5 st 


31 


14.10 


4.94 


4.7i 


69 


13.7 


4. 1 If 


6.4i 


134 lbs. ; 14 and 


32 


13.2 


4-94 


5.i3i 


70 


13. 1 1 


4.8 


4.44 


1 44 years, 6 St. 34 


33 


14. 1 


4.8I 


5.84 


71 


I3II 


4.8 


4.44 


lbs. ; 144 and 15 


34 


13.10 


5-2i 


6.8i 


72 


13.2 


4.7i 


4.10 


years, 6 st. 8J lbs. 


35 


14.0 


4. 1 14 


5.7 


73 


14.0 


4.11 


6.5^ 




36 


14.4 


4. II 


6.5 


74 


13.3 


4.34 


4.ii 


Heights may be 


37 


14.8 


4.11 


6.o| 


75 


13.3 


5-2 


7.2i 


tabulated in the 


38 


13.7 


5-o| 


6.2 


76 


13.7 


4.8J 


5.6 


same way. 



A graphic method of finding the median closely is given by 
Graphic Mr Galton in the Report of the Anthropometric 
method. Committee of the British Association, 1881, p. 247 ; 
and is illustrated by the diagram facing the next page. 

On a horizontal line mark off equal intervals represent- 
ing units of measurement, say inches. On a vertical scale 
mark off equal intervals representing the number of instances, 
e,g.y persons whose heights are measured. Beginning at the 
lowest, say 51 J inches, on an imaginary vertical line mark as 
many dots at equal intervals on the vertical scale as there are 



128 ELEMENTS OF STATISTICS. 

persons at that height, so that each dot represents one person. 
From the highest dot thus marked, suppose a horizontal h'ne 
drawn till it is over the next height division, 51^ inches, and 
with this new base proceed as before, marking each instance 
at 51 J inches by a dot vertically above the si^-inch mark. 
Next draw a connected line through the middle points of the 
consecutive vertical rows of dots ; if there is an odd number 
of dots, the middle one is taken as the middle point ; if an even 
number, the middle point is half-way between the middle ones. 

On the vertical scale mark the positions of the median, 
quartiles, &c., obtained by dividing the distance representing 
the total number of instances into appropriate parts, and 
through these points draw horizontal lines to intersect the 
connected line already drawn. The points of intersection 
lie vertically above the heights required, as marked on the 
horizontal scale. 

Now it may be assumed that the heights of all persons 
returned at, say, 58 J inches, are in reality evenly distributed 
between the limits s8f and s8|- inches, heights lying within 
which would be so returned ; and it can be verified that the 
construction just given shows the place of the median, deciles, 
&c., almost exactly on this hypothesis. 

E. Geometric Mean. — It is not necessary to give a long 
discussion of the geometric or logarithmic mean, for its applica- 
tion is limited to a small class of figures which will be best 
dealt with at a later stage.* It was used by Jevons in 
his essay on the Fall in the Value of Gold, but he did 
not justify or explain its use.t If we have ;/ quantities 

a^y a^f . . . an, their geometric mean is \/^i- ^2- • • • ^n Its 
chief advantage is that the influence of large numbers is 
diminished and of small numbers increased, when the geo- 
metric mean is employed instead of the simple average. 
Suppose all the following groups of numbers represent price 
levels of various commodities as percentages of their height 
at a previous date : — 

Numbers. 

80, 160 
80, 80, 100, 324 
20, 20, 80, 80, 120, 120 
20, 20, 80, 80, 100, 100, 120, l6o, 324, 972 

* See p. 223, tn/ra. + But see Statistical Journal^ 1865. 



Arithmetic 
Mean. 


Geometric 
Mean. 


120 
146 


"3 
120 


73-3 
198 


57 
'04 



GRAPHIC METHOD OF FINDING MEDIAN, QUARTILES AND 

DECILES (after Galton : Alhropometrie Commitlee : Brit. Assn.). 
For ihe Heights of the 76 boys, between a^a of 13 and ij, stated on p. 127, 



Median 59^ inches. 
Quartiles |J-*- 
'Probable error' 2.1,* 
Deciles 55.5, 56.6, 57, 57.9, 
63.6, 6z, 60.6, 59.7. 



Arithmetic average, 59.095. 
Greatest density 57 or 59, 
„ „ in smoothed 

curve would be about 58, 
Geometric average 58.98. 



AVERAGES. izg 

A consideration of the last list leads to the conclusion that the 
general rise of price cannot be 98 per cent, while 4 per cent, may 
reasonably represent it. A tentative rule may be suggested : when 
the geometric mean differs much from the arithmetic it should be 
preferred. It should be calculated with the help of logarithms. 

F. Statistical Coefficients. — Before leaving the sub- 
ject of averages, we must pay some attention to "statistical 
coefficients.** A statistical coefficient is a number, whole or 
fractional, by which a total (e.g-., population) must be multi- 
plied to give an allied number (e.g-,, number of births). Thus, 
if the birth-rate is 40 per 1,000, the coefficient is .04. These 
coefficients play an important part in ordinary statistics and 
a very interesting rSle in the application of the law of error 
to demography. The population may increase or diminish, 
but the coefficients relating to certain numbers remain almost 
unchanged,* and by their use the statistics of different coun- 
tries may be compared, and numbers for future years can 
be forecasted in some cases with marvellous accuracy; the 
numbers of births, marriages, deaths in 1901 can be written 
down before their occurrence as exactly as they are needed, 
subject only to the chance of some great catastrophe. Coeffi- 
cients can be formed for births (in various districts), for deaths 
(according to age, profession, or disease), for marriages (at 
various ages), for suicides, crimes, accidents, consuniption of 
various commodities ; if the preliminary data could be obtained, 
for the number of persons crossing Westminster Bridge in the 
year, the number of visitors to the Monument, the number of 
umbrellas left in the train, and so on ; the list could be 
prolonged indefinitely. The more important coefficients are 
calculated for most civilised countries, and the rates on which 
they are based published in statistical abstracts. A knowledge 
of them is necessary for statistical investigations. 

A useful caution is given by Dr Bertillon.t In order that 

a coefficient may obey the laws of coefficients closely, the 

oiooiiiation of number to which it is to be applied should not 

ooeffidents. be that of the total population, but the number 

of persons or things capable of affording an instance of the 

resulting total. Suppose m to be this number of persons, c the 

coefficient, n the resulting total, then n^cm and ^=^- Thus, 

♦ See infra^ Part II. t Cours iUmentaire^'^. 94 seq, 

I 



I30 ELEMENTS OF STATISTICS. 

for the marriage rate, m should not be the total population, 
but the number of marriageable people. The importance of 
this rule is, however, not great as far as simple calculations 
are concerned -; for the less accurate coefficient can be easily 
seen to be nearly constant. Suppose M to be the total popu- 
lation, m the number of marriageable people, n the number 
of marriages. Let c^ be the coefficient for calculating the 

number of marriageable people, then ^i=^> ^2 ^^ coefficient 
for marriages on Bertillon's principle, then ^2=^- ^^^ ^8~m' ^^ 

more usual coefficient for marriages. ^s=£x^ = ri x Tj. Now 

if c^ and c^ are invariable, so also is c^ If, however, one of 
the factors, say c^y is more variable than the other, then c^ 
varies as much as r^, and the greater constancy of c^ is not 
discovered, if only c^ is calculated. 

G. General. — The function of averages will now be clear ; 

it is to express a complex group by a few simple numbers. The 

The ftmotion of mind cannot grasp the magnitudes of millions of 

averageB. items at once ; they must be grouped, simplified, 
averaged. The averages chosen must be those which will give 
the striking features and the essential characteristics of the group. 
Different methods will apply to groups of various classes ; each 
must be taken on its own merits. A good and suitable average 
has the following characteristics : — If there is a type it shows it ; 
it gives due influence to extreme cases ; it is not easily affected by 
errors or much displaced by slight alterations in systems of calcu- 
lation ; and it is easily calculated. 

The relative positions of the different kinds of averages dealt 
with gives some information as to the general nature of the group 
to which they refer. The arithmetic average, median and mode, 
are close together, if the group is symmetrical. The arithmetic 
average is probably above the median, if we have a small group 
at a high degree. The arithmetic average is generally below the 
median, if there is an absence of high numbers, and a concen- 
tration a little above the average. The mode will be badly 
defined, if our group is not homogeneous. The mode will pro- 
bably be below the arithmetic average, if there is a small group 
at a high degree. The mode is well marked, if the distribution is 
uniform. These rules are only tentative and easily nullified by 
exceptional circumstances. 



CHAPTER VI. 

SOME EXAMPLES OF THE USE OF 
AVERAGES IN TABULATION. 



CHAPTER VI. 

SOME EXAMPLES OF THE USE OF AVERAGES IN 

TABULATION. 



to train servloe. 



If our analysis of the nature and use of averages is complete, 
Appuoation of and if averages are of widely extended use, we 
ayeragei should now be able to express almost any group 
of figures by a few well-chosen numbers of definite significance. 

To apply a somewhat severe test at first, let us choose 
a familiar example from ordinary life, and consider how a 

suburban business man might test the merits of 
two railway systems, by one of which he intended 
to take a season ticket. 

The following table gives the train service between Leather- 
head and London in 1898 : — 

Train Service — Leatherhead to London. 

Number of Minutes to Journey, 
Waterloo— 

Down— 60, so, 52, 48, 47, 61, so, 44, 48, S3» 4Si 42, 4S» 49, 43, 48, 42, 43- 
Sundays—^o, 50, 47, 49, 50. 

UpSi, 46, 51, 48, 43» 44, 48, 48, 64, 45, 48, 47, 45, 47, 46, 47- 
Sundays—zfi, 48, 51, 51, 51. 

London Bridge — 

Down—6T, 65, 6s, 61, 74, 51, 56, 66, 65, S3, 59, 4i, 49, 44, 58, 57, 5^, 67, 80. 

Sundays-^*], 52, 66, 68, 88, 65, 65, 68, 65. 
Up-^, 57, 53, 58, 54, 41, 58, 52, 42, 40, 55, 67, 79, 98, 69, 66, 68, 64, 71. 
Sundays— J2, 71, 69, 70, 62, 81, 73, 73- 
Victoria — 

Dcwn—*i*i, 65, 55, 76, 77, 88, 48, 53, 46, 69, 89, 54, 82, 71, 9a 

Sundays— ^2, 45, 81, 84, 78, 61, 85, 83, 85. 
^—87, 65, 69, 69, 47, 48, 51, 83, loi, 58, 62, 61, 76, 103. 
Sundays— Sif 76, 80, 85, 85, 82, 94. 

The following table gives us the necessary information : — 



Average of four quickest trains 
Lower decile - - - 
Median - - - - 
Mode - . . - 
Number of trains on week days 
General average 



London 
Bridge. 

Min. 

41 

471 

65 
61 

63 



Victoria. 



Min. 

4^ 
48 

77 

•>• 
29 

73 



Waterloo. 



Min. 
42J 

43 
48 
48 

34 
48 



134 



ELEMENTS OF STATISTICS. 



It is to be noticed that the statistical method is generally 
limited to one aspect of a problem ; the question of punctuality 
might, indeed, be easily treated statistically, but the questions 
of comfort and relative picturesqueness of route will elude our 
analysis. 

The next example shows a method of throwing into relief 
the characteristics of a typical group of sociological data. 

The adjoining table gives the wages recognised by the 
Tabulation of Amalgamated Society of Engineers in many of 
wagM retunuL their branches in 1862 and 1891. 

Amalgamated Society of Engineers. — Wages in 1862 and 1891, 

Weekly, exclusive of Overtime. 





Z862. 


1891. 




Z862. 


1891. 




s. 


d. 


s. d. 




s, d. 


s. d. 


Accrington 


' 27 





31 


Faversham • ■ 


' 34 


33 


Ashford - 


■ 33 


6 


30 


Folkestone 


■ 34 


32 


Ashton-under-Lyne ■ 
Bacup 


' 29 
■ 26 


3 

I 


34 
28 


Frome 


• 24 


27 
■ 30 


Barrow-in-Furness • 


■ 31 





34 9 


Gainsborough - 


■ 27 6 


28 


Bath 


• 29 





31 


Glossop - 


■ 27 2 


32 


Bedford ■ 


■ 27 





29 


Gloucester 


■ 28 


32 


Bilston 


• 28 





30 


Grantham - 


. 28 6 


30 4 


Bingley - 


' 24 





29 


Grimsby - 


. 28 


32 


Birkenhead 


■ 29 





35 6 


Halifax - 


23 I 


31 


Birmingham 


• 32 





36 


Hanley 


• 28 3 


32 


Blackburn 


• 27 


6 


32 


Hartlepool 


- 26 


34 10 


Bolton 


• 27 


6 


/28 
1 32 


Heywood - 


■ 27 


f30 
134 


Bridgwater 


• 24 


6 


24 


Holyhead - 


• 32 


28 


Brighton - 


' 24 


8i 


29 


Huddersfield - 


- 26 


26 


Bristol 


• 31 





32 


Hull- 


■ 27 6 


34 


Burnley - 


• 27 





30 


Hyde 


f30 
128 


30 


Burton-on-Trent 


■ 25 





30 


28 


Bury 


. 28 


3 


130 
132 


Ipswich - 
Keighley - 


• 28 6 

• 23 


28 
27 


Cardiflf - 


■ 31 





34 


Kidderminster - 


- 28 


30 


Carlisle - 


• 24 


6 


30 


Lancaster - 


25 


32 


Chepstow - 


• 30 





34 


Leeds 


25 


30 


Chester 


' 30 





32 


.Leicester - 


• 26 


31 6 


Chowbent - 


. 26 





32 


Leigh 


• 27 9 


31 6 


Colne 


25 





31 


Lincoln - 


■ 26 7 


28 6 


Congleton 


. 24 





28 


Liverpool - 


■ 29 


34 


Coventry - 


' 28 





34 


Llanelly - 


• 22 


26 


Crewe 


• 29 


4 


30 


Macclesfield 


> 24 


29 6 


Darlington 


25 





31 6 


Manchester 


• 29 9 


35 


Dartford - 


34 





38 


Mexborough 


. 27 


32 


Darwen - 


27 





32 


Middlesborough 


• 25 


34 


Derby 


26 





29 


Middleton- 


• 29 5 


33 


Doncaster - 


- 28 


6 


31 6 


Milton and Elsecar • 


. 28 


34 


Dover 


• 35 


6 


36 


Neath 


• 32 


30 


EnBeld Lock - 


36 





40 6 


Newark - 


■ 25 


29 


Exeter 


23 





/28 
.32 


Newcastle - 


- 25 


J35 
137 



USE OF AVERAGES IN TABULATION. 



135 





x862. 


1891. 




Z862. 




s. 


d. 


s. 


d. 




s. d. 


New Holland • 


• 30 


8 


34 





Stafford . 


' 34 


Newport - 


• 30 





32 





Stalybridge 


■ 28 3 


New Town (Stockf 
Newton Abbott - 


)ort) 29 
■ 33 






32 
33 






Stockport • 


• 28 


Northampton - 


• 26 





32 





Stockton-on-Tees 


' 24 


Northfleet- 


- 36 





36 





Stoke-on-Trent - 


• 29 


North and So. Shic 


dds 26 





35 





Stroud and Thrupp • 


• 26 


Norwich - 


- 32 





29 





Swindon - 


■ 31 6 


Nottingham 


• 27 


5 


34 





Todmorden 


■ 26 


Oldbury - 


• 28 





34 





Wakefield - 


■ 25 


Oldham - 


- 29 





33 





Warrington 
Watford - 


• 28 


Peterborough • 


. 28 


6 


33 





■ 35 


Plymouth - 


• 32 





33 





Wednesbury 


- 26 


Pontypridd 
Portsmouth 


• 24 
■ 35 






30 
34 






Whitehayen 


■ 25 


Preston - 


• 27 





32 





Wigan 


• 28 


Radcliffe Bridge 


- 27 





r3o 

I 32 






Wolverhampton 
Wolverton 


28 
• 29 2 


Reading - 


. 28 





/32 

134 






Worcester - 
Bermondsey 


■ 31 
• 35 4 


Ripley 


- 26 





26 


6 


Blackwall - 


■ 34 


Rotherham 


- 27 


6 


32 





Bow - 


• 36 


Rugby 


- 32 





(28 

1 32 






Greenwich 
King's Cross 


• 34 

• 36 


Rugeley - 


• 24 


II 


30 





Lambeth - 


• 35 8 


St Helens - 


- 28 





/34 
136 






London, E. 
» N. 


■ 35 

■ 35 10 


Sheffield • 


- 28 





36 





» s. 


■ 35 


Shipley - 


- 25 


9 


r28 
130 






,. w. . . 

Marylebone 


35 6 
• 33 


Shrewsbury 
Smethwick 


■ 30 
• 28 


6 



32 
35 






Stratford - 


•0^ t 
133 6 


Southampton • 


• 32 





34 


6 


Tower Hamlets 


■ 36 6 


Sowerby Bridge 


■ 24 


6 


30 





Woolwich - 


• 36 



X89Z. 

s, d. 



{ 



30 

2 

34 
36 

•32 
30 

31 

28 

30 

34 

36 

31 
/28 

136 

34 

33 

29 

30 



o 
o 
o 
o 
o 
o 
o 
6 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 
o 



38 o 



The following figures show the same in brief: — 





Z. 


9. 


3- 




i86a.* 


iSqi.* 


xSgx.t 




s. d. 


s. d. 


s, d. 


Maximum . . . - 


36 6 


40 6 


• ■ • 


Upper decile - - - - 


35 


38 


38 


Upper quartile 


3i ^ 


34 


36 


Median 


28 


32 


34 3 


Arithmetic average - 


28 10 


32 4 


33 4 


Modes 


28 


1 30 
I 32 


••• 


Lower quartile 

Lower decile - • . • 


26 


30 


3« 6 


24 6 


28 6 


30 


Minimum . - - . 


22 


24 


t • • 



* Each branch counting as i. 

t The numbers of members in each branch counted as receiving 
the wage recognised there. 



136 ELEMENTS OF STATISTICS. 

If the rates at each branch were not those actually paid to 
all members, but their average while the actual wages were 
confined within small limits of that average, the figures in the 
last column would be little affected. 

On comparing columns i and 2 it will be seen that not 
only have all the averages increased, but that since the lower 
decile and quartile have increased more rapidly than the upper, 
the lower half has also gained on the upper. Again the wages 
are grouped more closely in column 2 than in column i. 

We need a simple and general method of describing the 
distances of members of a group from their average and each 

other, in order to compare one group with another. 

iUspersion. j^e fraction 2^1^ where Q^ and Q^ are the 

quartiles, supplies such a measurement, and may be called the 
dispersion. It is easily verified that this fraction increases when 
the ratio of the upper to the lower quartile increases, and that it 
is sensitive to changes in a group, and that it lies between o and I. 
For the above figures this quantity is .093 in 1862, and 

^'"^^' =.062 in 1891. The dispersion diminished in that period. 

Other suitable measurements of dispersion would be (in the notation of 

p. 126) — -^ all the terms in the numerator being taken positively, or 

fitfi 

/J L I -5- a. For the above figures in 1891 these become .078 and 

.097 respectively. The methods will in general give different results, often 
in the proportion here found, and the special method adopted must be 
adhered to during an investigation. 

Group C of Tabulation. — It was necessary to postpone 

the tabulation of non-numerical or descriptive answers till we 

TabTiiation of ^^^ finished our discussion of averages. The fol- 

desoriptive lowing detailed example shows how the median, 

""'^"" &c., can be used to give a short description of a 
large group of adjectival answers. 

In 1 891 the Amalgamated Society of Engineers obtained 
from all their branches answers to the question : To what extent 
is overtime worked ? The branch secretaries sent answers which 
may be tabulated as follows : — 

Answers. 

None - . - - 

Not worked - 

Very little 

To very limited extent 

Very occasionally 

A little on repairs 

Little- 



Number of 


Number of 


Branches. 


Members. 


4 


140 


1 


78 


23 


4,836 


I 


63 


I 


350 


I 


500 


2 


73 



USE OF AVERAGES IN TABULATION. 



137 



Answers. 

2 hours when necessary 

Seldom 

Small extent - 

Seldom except on repairs 

Only on repairs 

Not much 

On repairs 

Not to any extent 

Not to a great extent - 

Not general - 

Not systematically 

In cases of breakdown or emergency 

2 hours regularly 

Chiefly on repairs 

Occasionally - 

When necessary 

Casually (sic) - 

A good deal on repairs 

Maximum 18 hours in 4 weeks 

Moderately - 

Systematically in good trade - 

Aveiage about 5 hours a week 

Considerably in marine shops 

Systematically in dockyard 

General 

Systematically 

Great amount - 

To a great extent 

Excessively - 

9 hours a week 

12 „ (maximum) 

14 „ (when busy) - 

10 to 18 hours a week - 

Total . 
Unclassed : — 

No answers - * - 

As little as possible - 

Not so much lately - 

In machine shops for six montiis 

In steel works - - - - 



Number of 
Branches. 
I 


Number of 
Members. 

80 


I 
I 


59 
16 


I 


66 


2 


216 


6 


1,125 


I 


500 


3 
2 


644 
162 


I 


7 


2 

7 


43 
606 


I 


136 


I 


20 


2 

I 


90 
348 


2 


142 


I 


23 


I 


1,000 


3 


262 


I 


200 


I 


96 


I 
I 
2 
I 
I 


400 
650 
146 

693 
263 


I 


72 


I 


550 


I 
I 


39 
106 


I 

I 


700 
106 


I 


5,000 


- 88 


20,666 


- 36 


5>"4 


I 


250 


I 


160 


I 


60 


I 


348 



138 ELEMENTS OF STATISTICS. 

An inspection of the table here given will show sufficiently 
the method of tabulation. The position of most of the answers 
BxpiaiiAtion of in an imaginary scale is fairly definite, except that 
towa It is not always obvious where the numerical 
answers should be placed ; this must be decided either by internal 
evidence or practical knowledge of the trade. The same adjec- 
tives did not of course convey exactly the same numerical 
meaning to all the branch secretaries who used them, but it will 
be admitted that this tabulation gives a fairly clear view of the 
case, and that the method of medians and quartiles may be 
appropriately applied. Taking the member of a branch as the 
unit and neglecting the unclassed answers, the median is 
"Maximum 18 hours in 4 weeks" or "moderately," the lower 
quartile "Very little," and the upper quartile "14 hours when 
busy." Taking the branch as unit, the median is " Not much," 
the quartiles are "Very little" and "When necessary" or 
" Occasionally." 

This method, which, with varying degrees of precision, is 
widely applicable, seems to afford the only way of comparing 
two such groups of answers- The precision attainable is to be 
measured by the distance through which the median can be 
shifted by making reasonable variations in th& scheme of 
tabulation. 

Summarisation. — Now that we have the method of averages 
at our disposal we may use it for tabulating and summarising a 
group of figures. 

Consider, for example, the answers to the questions issued 
by the Commissioners on Trade Depression in 1886. 

Four of the questions were : — 

1. Number of men in Society. 

2. Number out of work in 1885. 

3. Weekly wage in 1885. 

4. Change in wages between 1865 and 1885. 

The following table shows the answers given by the branch 
secretaries of the Amalgamated Society of Engineers : — 



USE OF AVERAGES IN TABULATION. 



139 



I. 


9. ^ 

No. in 


3* 
No. Out 


Current 


5- 


Dbtrict 


District, 


of Work, 


Wff«. 


Change between 1865 and 1885. 




1885. 


1885. 


1885. 




Belfast • 


1,100 


130 


28/ to 36/ 


Slight increase. 


Coventry 


2,500 


230 


31/6 


Contract work— 50 7© de- 
crease. 


Dukinfield - 


170 + 


20 + 


31/ 

25/ skilled. 
1 5/ unskilled. 


Slight increase. 


Dundee 


1,400 


457o 


Time work— 1865, 22/ ; '72, 








24/; '80, 26/; '83,24/; 










'85, 25/. 


Glasgow 


28,000 


4,000 


26/ 


Time wages, 5 /^ above 
1864. 


Glasgow (St Rollox) 


1,600 


250 


••• 


Rise in 1872-73 of 15 7«; 
1885 same as 1865. 


Hartlepool - 


1,200 


400 


31/6 


Advance of 3/. 


Glossop 


13s 


10 


32/ 




Liverpool 


280 


38 


1 1 • 


Rise in 1872-73 of 7i 7o ; 
1885 same as 1865. 


Monifieth 


114 


18 


at/ 


Skilled work— 1865, 24/; 
'76. 27/ ; '78, 25/ ; '83, 
28/ ; '8s, 25/. 


Nottingham • 


4,000 


600 


34/minimUin. 


1865,28/; 1885,34/. 


Oldham 


1,600 


96 


33/ average. 


Increase of 5 Vo* 


Oxford - 


45 


• t • 


33/ 


•••••••••••• 


Paisley . . - 


800 


• • • 


28/6 


1865, 26/ ; 1885, 28/6. 


Preston 


630 


40 


*!< 


None. 


Preston 


900 


120 


28/ 


None. 


Shipley 


201 


15 


28/6 

04/ non>tuilonists. 


1865, 28/6; 1869-73, 32/; 

1885, 28/6. 
1865-75,25/6; 1875-85,28/. 


Sowerby Bridge - 


1,120 


43 


28/ 


Sunderland - 


3,200 


400 


33/ 


1864, 27/ ; '74, 34/ ; 1875- 
85, between 31/ and 37/. 


Swindon 


6,050 


2 


31/6 




Ulverston 


45 


•■• 


31/ 


1865, 26/ ; i875> 31/- 


Wednesbury - 


400 


30 


o3o/ , 


Increase of 2/. 


Workington - 


170 


70 


28 to 36/ 


Increase of 30 •/©• 



It is suggested that the following are the summary tables 
which should be inserted in a report dealing with the answers. 

The figures are given here for only one society, but the 
tabulations are framed so as to include all. 

TABLE I. — State of Employment. 



Name of Society. 


Total Number* 

in Branches 
making Returns 
on Employment. 


Number Out of 
Work. 


Percentage Out 
of Work. 


Median of the 

Percentages Out 

ofWorkinthe 

Various Branches. 


A.S.E^ 

O.S.B. 

&c. 


55,170 


7,142 


13 


12 



* Details of some of the most important branches should be added. 



I40 



ELEMENTS OF STATISTICS. 



TABLE II.— Current Wages. 



Name of Society. 


Average of Wages in Branches. 


Quartilesof 
Branch Wages. 


Measure of Dis* 
persion (v. p. 136). 


Unweighted. 


Weighted. 


A. S* £• * 


t, d, 
30 


«. d, 

29 7 


X. d. *. d. 
28 32 


A 


O.S.B. 










&C. 











TABLE III. 
A. Change of Wage between 1865 and 1885. 



Name 
of 

Society. 


Number of Branches showing 


Median 
of Per- 
centage 
Increases. 


Percentages of Members in Branches 
showing 


No 
Answer. 


De- 
crease. 


No 
Change. 


Increase. 


No 
Answer. 


De- 

crease. 


No 
Change. 


Increase. 


A.S.E. 
O.SB. 
&c 


4 


I 


5 


13 


10 


II 


4 


6 


79 



Verbal Summary, — In the great majority of cases a con- 
siderable increase of wage took place between 1865 and 1885, 
equivalent on the whole to a rise of about 10 per cent The 
figures are not sufficiently definite to give an exact average. 

Table III. — B, Change of Wage between 1865 and the 

Maximum about 1873. 

Table III. — C. Change of Wage between Maximum about 

1873 AND 1885. 

(Tabulation as in III. A^ 



CHAPTER VH. 
THE GRAPHIC METHOD 



CHAPTER VII. 
THE GRAPHIC METHOD. 

I. General Purpose. .^ 

The two main methods of elementary statistics which ought to 
be understood by all students or officials who handle figures, 
which are easily within the grasp of all independently of mathe- 
matical training, but are generally misunderstood or ignored by 
the uninterested or the uninitiated, are the method of averages 
and the method of diagrams or the graphic method. These two 
are placed together because the uses of averages and diagrams 
are nearly related. When we deal with large and complex 
Ayorages and masses of figures we are unable to grasp them in 
***«"**^ their entirety, however clearly they may be tabu- 
lated. Any list of figures — the populations of different towns, 
the death-rates at successive ages, the wages of many work- 
people, the imports for a series of years — becomes less compre- 
hensible as its length increases. A series of ten numbers can, 
perhaps, be easily grasped, of twenty only with an effort ; while 
a printed list of figures for one hundred successive years leaves 
hardly any impression on our mind at all ; we cannot see the 
wood for the trees. The test to which all questions as to the 
use of averages should be referred is that the averages selected 
should afford the best summary of the whole group in question 
that the mind can grasp. When the meaning of the word 
average was sufficiently extended, we found that we could select 
three, four, or even ten suitable figures which adequately showed 
the main features of any group. The main use of diagrams is 
also to present large groups of figures so that they shall be 
intelligible in their entirety, and the test for all diagrams is that 
the diagram as drawn should afford the best view of the series 
or group of figures that the eye can appreciate. Diagrams have 
one use which averages have not, for it is only by a diagram that 
a series of figures relating to successive years can be adequately 
presented ; but in reality they are less essential than averages, for 



144 



ELEMENTS OF^ STATISTICS. 



the latter often have an existence independently of the figures 
from which they are derived, representing true types of the 
quantities which are being measured ; and by their use alone 
are further comparisons of complex groups made ^ssible : while 
diagrams, on the other hand, might be dispensed with, being 
auxiliary rather than essential, merely an aid to the eye and 
a means of saving time. 

To connect this chapter more closely with the preceding, we 
oranbio ^^'^ show how the same group of figures, for 
npramtotion example the wages of a large group of workpeople, 
of ftvengef. ^^^ ^ represented by either method. 

Consider the following data : — 



Numbers of workpeople earning — 



From 15/ to 16/ 



16/ 

17/ 
18 

19/ 
20/ 

21/ 

22/ 
24/ 



II 
II 

f) 
II 
II 
II 
II 
» 



17/ 
18 

20/ 
21/ 
22/ 

23/ 
24/ 

25/ 






200' 

400 

100 VI, 000 

100 

200^ 

200' 

300 

300 V 2, 200 

500 

900^ 



From 35/ to 36/ 

36/ II 37/ 
37/ II 38/ 
38/ II 39/ 
39/ II 40/ 



»i 




- I,20o^ 

- 800 

- 7oor3i5oo 

- 500 

- 300 J 

300^ 
400 

400V2,T0C 

500 

- 500. 



; 



Using the method of averages we should replace this group by 
the following figures : — 



or 



Average of all 

„ lowest 1,000- 

ft 



highest 1,000 
middle 4,000 



X. d. 

27 6 
o 
6 
o 



17 
36 
27 



Median, 26/9; quartiles, 24/2, 32/. 

Deciles, 20/, 23/6, 24/9, 25/8, 26/9, 28/2, 31/, 33/4, 35/4. 

Mode, 25/3; secondary positions, 16/6, 36/. 



or 

Persons earning from 
Percentages of all • 



15/ to 20/ 20/ to 25/ 25/ to 30/ 30/ to 35/ 35/ to 40/ 
10 22 35 21 12 



GRAPHIC Rl 



2 3*5678 



GRAPHIC METHOD. I4S 

This group is represented on the annexed diagram, an 

example of the graphic representation of the relation between 

oonstruotion ^^^ variable quantities. A figure similar to this 

of simple may be used to show birth, marriage, or death 

diagrams. j-ates at different years, numbers of persons of 
various statures, demand at different prices, or any such group 
of. homogeneous quantities. The same construction can be 
used to show the changing values of any number in a series 
of years. Draw a line parallel to the bottom of the page, and 
mark equal intervals to represent a quantity which can have 
many successive small increments, such as age, income, height, 
price, time, and so on. This is called the axis of abscissce^ 
and the distance of a point measured from the zero position 
along the line is called its abscissa. At right angles to this 
line, parallel to the side of the paper, through the zero position 
we draw another, called the axis of ordinates^ and grade this 
to correspond to the numbers possessing the qualities repre- 
sented by the abscissae • at each grade on the axis of abscissae, 
draw lines at right angles to it, to represent on the chosen scale 
the numbers at that grade ; these lines are called the ordinates. 
In the annexed diagram the abscissae represent the amounts 
of wages, the ordinates the number of persons earning them. 
Join the tops of the ordinates by straight lines and the diagram 
is complete.. In practice, when squared paper is used, without 
drawing the ordinates their tops can be marked. 

This diagram shows at one glance the distribution of the 

wage-earners according to their wages. A small number earned 

Desoriptton between 15s. and i6s., a slightly larger group 

oftnewaga between 1 6s. and 17s., very few between i^s. and 

****""** 19s. Above 19s. the number continually rises ; 
high numbers are found from 24s. to 27s., the highest between 
25s. and 26s. The line falls to the 30s. group, but not so low 
as betweein 17s. and 19s., then it rises regularly to 36s., and 
falls rapidly to 39s. Here, then, we have the main group 
congregated in the neighbourhood of 25 s., a distinct but smaller 
group at 36s., and a small and nearly isolated group at i6s. ; 
representing a considerable group of highly-skilled men between 
30s. and 40s., the great mass with ordinary skill between 20s. 
and 39s., and a small group of incompetents at i6s. These 
features- would not be-so easily seen from the tabulated figures. 

It is to be noticed that the number tabulated as between 



146 ELEMENTS OF STATISTICS. 

15s. and 1 6s. is represented by the ordinate at 15s. 6d., the 
middle of the interval ; if the original figures on which the 
table was based had been given to the nearest id., the ordinate 
should be drawn at 15s. Sjd.* It is important that these middle 
points should be accurately placed. 

The use of the line joining the tops of the ordinates is two- 
fold. First, it enables the eye to judge relative heights moce 

easily ; and secondly, it suggests the idea of con- 
^' tinuity, which can be better illustrated by the next 
diagram. In this the abscissae represent ages, the ordinates 
the estimated numbers of persons living at and above the ages 
at which they stand per thousand inhabitants of England and 
Wales at the middle of the year 1891. The ordinates were 
drawn at the points on the axis of abscissae representing the 
middle of each year of age ; but length of life cannot be 
expressed exactly in years, or even in months, days, or minutes. 
The intention of the diagram is to show the proportion living 
above each age, and for this purpose the joining line should 
have no breaks or sharp angles, but should suggest absolute 
continuity. 

In practice, it is useless to mark in the points for smaller 
intervals than a year, for the eye could not grasp the detail. 
It is, however, implied that the line drawn has the same shape 
as that which would result if the number of persons was infinite 
and the subdivision by age infinitesimal. 

Estimated number per 1,000 of the population at and above — 



Ages. 




Ages. 


Ages. 


Ages. 




Ages. 







1,000 


16 


628 32 346 


49 


152 


65 


47 


I 


973 


17 


607 33 332 


50 


143 


66 


43 


2 


949 


18 


587 34 318 


51 


135 


67 


38 


3 


925 


19 


567 35 305 


52 


127 


68 


34 


4 


901 


20 


547 36 292 


53 


119 


69 


31 


5 


^77 


21 


528 37 280 


54 


112 


70 


27 


6 


854 


22 


510 38 268 


55 


1Q4 


71 


24 


7 


830 


23 


491 39 256 


56 


98 


72 


21 


8 


807 


24 


474 40 244 


57 


91 


73 


18 


9 


783 


25 


456 41 233 


58 


85 


74 


15 


10 


760 


26 


439 42 222 


59 


79 


75 


13 


II 


738 


27 


423 43 211 


60 


73 


76 


II 


12 


715 


28 


407 44 201 


61 


67 


77 


9 


13 


693 . 


29 


391 45 191 


62 


62 


78 


8 


14 


671 


30 


376 46 181 


63 


57 


79 


6 


15 


649 


31 


361 47 171 
48 161 


64 


52 


80 


5 








Calculated from the Census of 1891. 






• 








* See p. 88, supra. 











GRAPHIC METHOD. 147 

Numbers pbk 1,000 ov the Population above Assicnrd Ages. 
Nninbers. 



Ag _ _ 

Apply these remarks to the diagram facing p. 145. Average 
earnings for a year will not be reckoned exactly by shillings 
or even pence ; if we had a sufficient number of instances we 
should get regular sequences of earners at successive farthings, 
and the line representing them would have no sharp angles, 
but be continually curved. The figure rightly gives the eye 
this impression of continuousness. Similarly in the diagram 
representing exports facing p. 151, the line correctly gives the 
impression that exports are continuous day by day. 

By an obvious step we may suppose that the unit of area, that 
contained between vertical lines through two consecutive divisions 
on the axis of abscissa, and horizontal lines through 
two consecutive divisions on the axis of ordinates, 
represents one wage-earner, and it is then easy to see that the 
area contained between the base line, the curve, and two vertical 
lines through the points marking any two amounts of wage re- 
presents the total number earning rates between those amounts. 

Hence the lines (see p. 145) through M, the position of 
the median, Q^, Qg those of the quartiles, D^, Dj, Dg, D„ Dg, D„ 
Dg, Dg of the deciles divide the area ABm^m^mfiT) into two, 
four, and ten equal areas respectively. The centre of gravity 
of this figure lies on the vertical line through V, the averse 
wage ; and the feet of the ordinates through the highest points 
;«i, m^ m^ are at the modes, 



148 ELEMENTS OF STATISTICS. 

The details of technique of diagram drawing, the position 
of the scales, the devices for making the figure clear, and so 

Beqninte ou, can be gathered from the various diagrams 

•oonraoy. given in this chapter. The degree of accuracy to 
which the figures should be marked, whether correct to a 
million, a thousand, or a unit, is determined simply by the 
power of the eye to grasp detail ; in most of those here given 
it will be found that a displacement of one in a thousand is 
perceptible, and this is the ordinary limit. More minute accuracy 
is useless, for it is not the function of diagrams to dispense 
with lists of numbers, but only to enable the eye to perceive 
their significant features. 

Before discussing the choice of scales on which the nunibers 
are to be represented, it is necessary to consider the ways in 

which a diac^ram makes an impression on the eye. 
The eye can judge — (i) Distances; (2) ratios; (3) 
angles. The dotted lines in the diagram facing p. 1 5 1 will illus- 
trate these points. ( i .) The eye is a fairly safe judge of distances ; 
there is very little doubt which of two points is the further 
from the base line ; when squared paper is used, a difference 
of I in 1,000 is perceptible. The eye can also judge differences 
quickly. In the figure the value of the exports in 1883 exceeded 
that in 1885 by more than the value in 1890 exceeded that in 
1883. (2.) It can be quickly seen that the value of exports 
doubled between 1862 and 1889; or that the value in 1878 is three- 
quarters of that in 1890. The accuracy with which the eye can 
make such measurements is not great; it is not easy to detect 
that the ratio of the values in 1873 and 1 871 (1.095 • i) is greater 
than the ratio of the values in 1882 and 1880 (1.073 ' l but the 
general impression given by the diagram is partly made up by 
unconscious calculations of this nature. To make these obser- 
vations accurately the method described on pp. 188-9 should be 
used. Notice that for these observations the insertion of the 
base line is necessary ; and, because they are made unconsciously, 
there are very few cases where a diagram without a base line 
gives a correct impression. (3.) The question, Was the increment 
greater in 1887-88 or in 1888-89? can be more quickly answered 
by observing the angles than by noting the differences. The 
line showing the latter change is steeper (makes a greater angle 
with the horizontal) than the line showing the former. Hence 
the latter increase is the greater : actually ;£'i4,400,ooo against 



GRAt»HiC MfeTHob. i49 

;^ 1 2,600,000. The most useful exercise of this power, however, 
is to judge the dates at which the rate of increase changed ; thus 
the value of exports increased in 1862-63, increased at a slower 
rate in 1863-64, and slower yet in 1864-65, more rapidly in 
1,865-66 ; a slow fall followed in 1866-67, then an increase began 
which is continually accelerated to 1871, and so on. The line 
from 1872-76 is concave to the base line, showing an accelerated 
fall ; the concavity from 1879 to 1882 corresponds to a retarded 
rise ; at 1888 convexity gives place to concavity, for at that date- 
the rate of increase began to diminish. 

It is difficult to lay down rules for the proper choice of the 
scales by whiph the figure should be plotted^ out It is only the 
- Ohoioeof ratio between the horizontal and vertical scales 
scale. that need be considered. The figure must be 
sufficiently small for the whole of it to be visible at once ; if the 
figure is complicated, relating to a long series of years and vary- 
ing numbers, minute accuracy must be sacrificed to this con- 
sideration. Supposing the horizontal scale decided, the vertical 
scale must be chosen so that the part of the line which shows 
the greatest rate of increase is well inclined to the vertical, 
which can be managed by making the scale sufficiently small ; 
and, on the other hand, all important fluctuations must be 
clearly visible, for which the scale may need to be increased. 
Any scale which satisfies both these conditions will fulfil its 
purpose. The annexed page shows the erroneous impressions 
which can be given by a judicious manipulation of the scale 
and by the omission of the base line. The diagrams, which 
are drawn roughly, all represent the same estimates of wages in 
England and in the United States of America for certain years 
from i860. Figure i sets the lines in proper relief In figure 2, 

Neoewityof *^^ ^^^^ '^"^ ^^ "^^ drawn in the zero position 
correct for the English scale, and the American scale is 
base line. reduced ; the consequence is that English wages 
appear to have fluctuated widely, while American made steady 
progress. In figures 3, 4, and 5 the scales are doctored and the 
base line adjusted, so that in 3 American wages seem to have 
caught up English, in 5 exactly the reverse is the case, while in 
4 wages appear to have moved with equal rapidity in both 
countries. An examination of these figures will show that the 
eye cannot be trusted to supply the right base line, or to 
estimate the importance of fluctuations without it; and, with 



! S sajlJREE 



I* - 

I' 



S 
I 



< S< 



iiSSiS^s 



H 



vi a a s, 8 






TOTAL VALUE OF BRIT 




on 



Years 1855 56 57 58 5Sl €0 61 62 63 64 65 66 67 68 69 70 71 



GRAPHIC METHOD. 



151 



certain exceptions to be mentioned later,* it is well to distrust 
all those numerous diagrams, where space has been economised 
at the expense of the base line. 

Total Declared Real Value of British and Irish Produce 
Exported from the United Kingdom. i = ;^i,odo,ooo. 







Averages 








Averages. 


Three 


Five 


Ten 


Three 


Five 


Ten 






Yearly. 


Yearly. 


Yearly. 






Yearly. 


Yearly. 


Yearly. 


185s 


95-7 


• ■ • 


• •• 


• •• 


1881 


234.0 


216.2 


208.2 


221.6 


1856 


115.8 


■ • • 


• •• 






1882 


241.5 


232.9 


216.7 220.1 


1857 


122.0 


III. 2 


• •• 






1883 


239.8 


238.4 


226.0 218.6 


1858 


1 16. 6 


1 18. 1 


• • • 






1884 


233.0 


238.1 


234.3 


217.9 


1859 


130.4 


123.0 


1 16. 1 






1885 


213. 1 


228.6 


232.3 


216.9 


i860 


I3S-9 


127.6 


124. 1 






1886 


212.7 


219.6 


228.0 


218. 1 


I86I 


125. 1 


130.5 


126.0 






1887 


221.9 


215.6 


224.1 


220.4 


1862 


124.0 


128.3 


126.4 






1888 


234.5 


223.0 


223.0 


224.5 


1863 


146.5 


I3I.9 


132.4 






1889 


248.9 


235.1 


226.2 


230.2 


1864 


160.4 


143.7 


138.4 


127.2 


1890 


263.5 


249.0 


236.3 


234.2 


1865 


165.8 


157-6 


144.4 


134-3 


1891 


247.2 


253.2 


243.2 


235.5 


1866 


188.9 


I7I.7 


157.2 


141. 6 


1892 


227.1 


245.9 


244.2 


234.1 


1867 


181.0 


178.6 


168.7 


147.5 


1893 


218. 1 


230.8 


240.9 


231.9 


1868 


179.7 


183.2 


175. 1 


153.8 


1894 


215.8 


220.3 


234.3 


230.2 


1869 


190.0 


183.6 


181. 


159.8 


1895 


225.9 


219.9 


226.8 


231.4 


1870 


199.6 


189.8 


187.8 


165.9 


1896 


240.1 


227.3 


225.4 


234.1 


I87I 


223.1 


204.2 


194.6 


175-7 


1897 


234.3 


233-4 


226.8 


235.4 


1872 


256.3 


226.3 


Jft)9.7 


188.9 


1898 


233.4^ 


235.9 


229.8 


235.3 


1873 


255.2 


244.9 


224.8 


200.0 


1899 


255.3* 


241.0 


237.8 


236.1 


1874 


239-6 


250.4 


234-7 


207.9 


1900 


283.6* 


^57-4 


249.3 


238.1 


1875 


223.5 


239-4 


239.6 


213.7 


1901 


270.9* 


269.9 


255.5 


240.5 


1876 


200.6 


221.0 


2351 


214.9 


1902 


277.7 


277.4 


264.2 


245.5 


1877 


198.9 


207.7 


223.7 


216.7 


1903 


286.5* 


278.4 


274.8 


2^2.3 


1878 


192.8 


197.4 


210.9 


218.0 


1904 


296.3* 


286.8 


283.0 


260.4 


1879 


191-5 


194.4 


201.4 


218.1 


1905 


324.4* 


302.4 


291.2 


270.2 


1880 


223.1 


202.5 


201.3 


22 


0.5 


1906 


367.0* 


329.2 


310.4 


282.9 



* Not including the newly reckoned value of ships exported. 

We can now pass on to the consideration of the smooth- 
ing of curves, for which purpose the question of the " alleged 

Smoothing stationariness of our exports," discussed by Sir R. 

o'"^"- Giffen in his paper before the Royal Statistical 
Society in 1899, affords an excellent illustration. The thin 
dotted line on the diagram opposite shows the value of exports 
year by year, and the first impression given by it is that exports 
have not grown in value in recent years. Sir Robert Giffen 
gave the following table : — 

Average Annu.^l Value of Exports. 



1855-57 - 
1865-67 - 

1875-77 . 

1885-87 - 

1895.97 ■ 



;^ I 34,000,000 
228,000,000 
264,000,000 
274,000,000 
292,000,000 



* See pp. 188-194, tn/ra. 



152 ELEMENTS OF STATISTICS. 

and from this he deduced " that all through there is an increase, 
and that the only sign of stationariness is an increase at a less 
rate in the last periods than in the earlier periods." 

The Saturday Review^ wrote "that such a conclusion is 
grossly misleading," for the figures are merely triennial averages 
of. selected years showing a happy coincidence ; "why was not 
1 898 included ? " An inspection of the numbers does not show 
us the answer to this criticism, but on the diagram the whole 
circumstances are visible at a glance. Since 1865 three great 
waves have been completed. The maximum of 1872, due to the 
inflated prices of that year, is very high, but that of 1890 is 
greater than any previous figure, while the maximum in 1882 is 
comparatively low. The minima increase throughout; those 
of 1868, 1879, 1886 show a regular progression, which falls off 
greatly in 189 1. In 1894-96 it looked as if another decennial 
cycle was in progress, but this has been checked. Since the 
discussion, the returns for the successive years to 1906 have 
shown an increase, surpassing that which preceded 1872. 

The Saturday Review went on to ask why Sir Robert Giffen 
did not give " proper quinquennial averages," such as 

Average Annual Value of Exports. 

'^70-74 ;^235,ooo,ooo 

1880-84 234,000,000 

1890-94 - - - . . 234,000,000 

^^98 233,000,000 

and it must be granted that this gives an appearance dia- 
metrically opposite to that of the previous table. 

It is clear that we need some general method of bridging 
these figures into a form which shall be quite independent of the 
choice of any special years. The diagram facing page 151 does 
this: The thick continuous line, lying almost over the dotted line 
of annual values, shows triennial averages taken yearly, that 
is the average of each year with those before and after it'; this 
line smooths off the corners without affecting the general appear- 
ance. The line of crosses shows quinquennial averages, each 
year being averaged with the two previous and two subsequent 
years. The line of circles shows decennial averages; each circle 
is placed at the centre of the period whose average it represents ; 

* January 1899, pp. 66, 67. 



GRAPHIC METHOD. 1 53 

thus the circle showing the average of the ten years 1875-84 is 
placed vertically over the line separating the years 1879 and 
1880* 

On looking at the line of quinquennial averages it is clear 
that the Saturday Review did precisely what it accused Sir 
ohoioeof Robert Giffen of doing, for years are taken which 
yvt^f^* favour the argument. The quinquennial periods 
selected for comparison with 1898 are all on the upper parts 
of the waves, the marks showing these averages are very near 
the maxinia of the quinquennial line, while the year 1898 does 
not appear to be a maximum. We might with just as much or 
as little accuracy give the following : — 

Quinquennial Averages of the Values of Exports. 

1865-69 ;^i 8 1,000,000 

1875-79 201,000,000 

1885-89 - - - - • 226,000,000 

1898 233,000,000 

and say that the value in 1898 was higher than any of the pre- 
vious selected averages. There is no need to use arbitrary dates 
to get at the facts. No argument can stand which does not take 
account of the cycle of trade, which is not eliminated till we 
take decennial averages. Special marks in the diagram show the 
averages for decennial periods, indicating a rapid increase before 
1870, followed by steady slow progress till the recent expansion. 
The complete line gives just the same general appearance. If, 
finally, the figures were completely smoothed by a freehand line 
keeping .as close to this as was possible, without making sudden 
changes of curvature, the same appearance would be given ; the 
thick line on the diagram is an attempt to do this. The smooth- 
ing is obtained by the assumption that the cycle of trade is ten 
years ; when two maxima fall within the same ten years the 
average of this period by our construction gives the appearance 
of a maximum {e,g,^ in 1887) at a date of a minimum. This 
would be avoided if we continually changed our period for 
averaging to accommodate the changing wave-length, a some- 
what arbitrary proceeding. The diflficulty thus arising can be 
easily corrected by the eye, and the final smoothed line is 
intended to convey this corrected impression. 

* In all the curves of averages the ihark showing the average is placed at 
the centre of gravity of the marks showing the 3, 5, or 10 quantities averaged. 



154 i!;i.EMENTS OF STATISTICS. 

It should be clear now that it was in 1899 five years too 
soon to pay attention to the particular figure for 1898; the 
figures for the next five years, necessary to determine the char- 
acter of the coming wave, could not be foretold. When these 
are included it is seen that each decennial average (for 1890-99, 
1 89 1 -1900, &c.) established a new record, and that the figures 
for each year from 1900 to 1906 are greater than those of any 
previous maximum. It will be seen, moreover, that the sentence 
quoted from Sir Robert Gifien on p. 152 is fully justified. 

The smoothed line now constructed represents the general 
tendency of the value of exports, when accidental and tempo- 
Meaning of rary variations are removed. If it were possible 
smooth line, ^q separate entirely variations of short period from 
secular changes, to separate the ebb and flow of the tide of 
commerce from the steady current of increasing trade, we may 
suppose that we should obtain a result represented by this line. 
In it there are no sudden changes even in rates of growth, while 
the addition and subtraction year by year of relatively small 
quantities would produce precisely that irregular fluctuating line 
from which the smooth line was obtained. 

The fuller discussion of "smoothing" series of figures be- 
longs to the chapter on interpolation, but one other group may 
smootbingft ^^^^ ^ considered, as showing the use of the 
bomogenecfiii graphic method for obtaining regularity out of 
'^"^ irregular raw material. Referring back to the 
figures given on p. 120, the wages of 5,000 workers can be 
expressed anew by a diagram, in which the ordinates represent 
the numbers earning ai or above a certain wage. The thin 
angular line on the adjacent page represents these numbers, 
entered for every lo-cent group. This plan is especially useful 
for irregular figures, like this wage-group, for the line must 
always tend upwards from the numbers earning the highest 
wage to the numbers earning at least the lowest. The diagram 
is also at once adaptable to the graphic method of finding the 
median described on p. 127. 

The irregularities shown by the thin line do not arise from 
any law of wage-grouping, but are due to the accidents of obser- 
vation ; if we regard these returns as samples out of a much 
larger unregistered group, we may suppose that a smoothed 
curve will indicate approximately the form which would be 
pbtained, if our returns were complete. To smooth this figure, 






/ 



GRAPHIC METHOD. 155 

draw a freehand line passing as near the points as possible 
without abrupt changes of curvature, as in the annexed diagram. 
A new approximation may be made for the median, quartiles, 
orapbio method ^^'> ^V drawing horizontal lines through the points 
or finding the on the vertical scale corresponding to half, one- 
quarter, three-quarters, &c., of the workers ; from 
the points where these cross the smooth line, draw vertical lines 
to the scale of dollars ; the points on the scale so obtained are 
* the median (quartile, &c.) wage. 

The results obtained are : — 





Median. 


Quartile. 


Quartile. 


Given on p. 92 


$1.49 


• • • 


• • • 


By method of p. 128, used 








in annexed diagram 


$1.49 


$1.16 


$2.12 


From smooth curve in an- 








nexed diagram - 


$1.51 


$1.15 


$2.13 


By method of interpolation, 








P- 253 - 


$1,536 


•• • 


• • • 



This method is not, however, one of great precision ; a very slight 
change in the curvature of the smoothed line would make more 
difference than those shown between the second and third lines 
in tile above table. 

This method is more useful for determining the mode. It 

will be remembered that the difficulties in doing this before 

oraphio method ^i^^se from the uneven distribution on the two 

of finding the sides of the mode, and in the displacement of the 

mode by the adoption of a second system of 
tabulation. The first of these difficulties entirely disappears 
in the graphic method, while the second is diminished, for 
the displacement now only depends on the slight possible 
variations in the curvature of the smooth line. The mode is 
clearly the position where the greatest number is added, in the 
present method of representing the figures : that is, the mode is 
where the line, angular or smooth, is steepest. On the smooth 
curve the maximum steepness is where the tangent crosses the 
curve, — in mathematical language, at a point of inflexion. This 
can be determined mechanically by placing a ruler to touch the 
curve, and turning it round the curve till it crosses it. On the 
annexed figure this occurs in the interval between $1.10 to $1.40. 



156 ELEMENTS OF STATISTICS. 

A more complex method of determining both mode a^i^ oiediaJi, 
is discussed in Chap. X. 

This graphic way of finding these averages has two great 
advantages. It can be applied to numbers which are given 
at irregular intervals of graduation (e.g-., 30 at 30s. 6d., 40 at 
30s. SJd., 35 at 40s. id., &c.) as easily and by exactly the same 
construction as to more regular returns ; and if the smooth curve 
is carefully drawn, the number of modes can be seen at a glance 
and the individual importance of each can be estimated. In 
the annexed diagram, the curve is concave to the base line from 
$.30 to about $1.20, convex from about $1.20 to $3.15, concave till 
$3.40, and then convex till the end. The points of inflexion or 
the modes are where concavity gives way to. convexity. > Hence 
there are two modes, of which that near $3.4 is of the less 
importance. The mathematical method of pp. 252-4 shows them 
to be at $1.10 and $3.20; the slight divergence between this 
result and that of p. 121 illustrates the difficulty of locating the 
mode exactly. 



A large class of diagrams may be passed by with ..a few 
words. Writers and lecturers frequently use points, lines, 

notorial triangles, squares, circles, even pictures, of diffe- 

***8"xn». rent sizes to assist the presentation of the rela;- 
tive magnitude of numbers. These have their use for popular 
lectures and hand-books, but do not add anything to the : signi- 
ficance of the figures. Collections of these may be found in 
the second volume of Gabaglio's Teoria Gemrale delta Statistical 
and in M. Levasseur's La Statistique Graphique in the Jubitee 
Votume of the Royal Statistical Society. 

Of these one group may be signalled as of practical use. 
Rectangles may be used to express three quantities : one side 
to represent price ; the adjacent side,, quantity ; and the area, 
value: or number of houses, average number of inmates and 
population : or number of hours* work per week, average output 
or hourly wage, and total output or weekly wage. The figures 
on the annexed page show the limit to which this method can 
be usefully pushed. 



GRAPHIC METHOD. 



Representation of Three Facts by Rectangles. 
-- Imaginary budgets of an artisan and a labourer, showing amounts 
q)oit weekly on various commodities, and number of hours' work 
BccElssary for each attaount. 



The horizontal scale 
represents pence per hour. 
.125 inch = id. 

The vertical scale re- 
presents number of hours 
per week, .i inch = 2 hours. 

The areas represent 
amounts spent, and the 
whole rectangles show the 
week's wages on the same 
scale. I sq. in. = 13s. 4d. 



The use of statistical maps needs only a brief notice. Any 
numerical quality of a population, its density, average income, 
n.,^,«». average taxation, may be shown district by district 
by suitable markings, or colours. Of these the 
most useful method is to choose one colour, say blue, for 
excess above the average ; another, say red, for defect Divide 
the districts in nine groups, say more than 7 per cent., 5 to 7 
per cent., 3 to 5 per cent, i to 3 per cent, above the average : 
these should be marked by four shades of blue, becoming lighter 
as the average is approached ; within I per cent, of the average, 
above or below, should be white ; and shades of red, gradually 
becoming darker, will show the remaining grades below the 
average. Care must be taken not to adopt too many grades. 



158 ELEMENTS OF STATISTICS. 

For examples of this method see Booth's Life and Labour of 
the People^ maps ; the Statistical Atlas of the Xlth Census of the 
United States ; the Statistical Atlas of India; and the maps in 
M. Levasseur's paper just mentioned. A cheap and very effective 
method, by which similar results are obtained in black and white 
only, may be seen on Plate P (misprinted 2) in that paper, and in 
the excellent chapter on Graphic Representation in Bertillon's 
Cours ilementaire de Statistique^ p. 133 seq. 



ZiSt^ 




To face page ijg. 



HISTORICAL DIAGRAMS. 159 



2. Historical Diagrams. 

Perhaps the chief use of diagrams is to afford a rapid view 
of the relations between two series of events. 

The different cases that occur are best illustrated by examples. 

The simplest is when we wish to compare two sets of figures 

oomparuon of expressed in the same unit, say £ sterling ; and 

flgnres the simplest of these when we wish simply to com- 
pare a whole and its parts. 

On the adjacent diagram the upper line shows the annual total 
gross revenue of the United Kingdom {Statistical Abstracty p. 9) ; 
iunstrated by the next line, that part which comes from inland 
the revenue, revenue and customs, the diffdrence being mainly 
composed of post office receipts. The principal heads of revenue 
are customs, excise, income tax, and post office. These are shown 
by suitable lines for each year, each line being independent of 
the other, and all having the same base line and being on the 
same scale. This method is greatly preferable to the alternative 
one of drawing a second line representing the total less customs, 
a third the total less customs and excise, and so on, because the 
eye is then quite incapable of judging the relative movements of 
the separate items. The figure shows at once the main features of 
the course of revenue. The increase has been rapid but irregular. 
The growth in the Crimean War was too rapid to be at once 
maintained, but the figures for the 6o's are at a far higher level 
than those for the so's. A rapid fluctuation in 1870 is followed 
by a more regular growth almost unchecked till 1887 ; and then, 
after a short stationary period, there are great increases in 1895, 
and between 1898 and 1903. The same remarks apply to the line 
showing inland revenue and customs. If we look for the parts of 
the revenue that have borne the increase and change, we see that 
prior to 1900 receipts from excise had increased most, next 
those from the post office, and next those from the income tax, 
while the customs had diminished. Each line has its distinctive 
features. The post office payments show an almost regular 
growth. The income tax fluctuates violently, bearing the brunt 
of nearly all the rapid changes in the total, especially in 1856 



i6o 



E^LEMENTS OF STATISTICS. 



Revenue of the United Kingdom. 

Unit, in all columns, ;^io,cxx>. 



Ydir ended 
31st March 


Total 
Reveni c. 


Inland 
Revenue 

and 
Customs. 


Customs. 


Excises. 


Property 

and income 

Tax. 


Post and 
Telegraph. 


1850 


5.739 


S.431 


3,226 


1.497 


^l 


216 


1851 


5,732 


S.412 


2,204 


1,528 


560* 


228 


185a 


5.658 


5.335 


2,223 


1.538 


550* 


237 


1853 


5.753 


5.401 


2,214 


1,575 


570* 


237 


1854 


5,890 


5,502 


2,251 


1,630 


S8o» 


252 


1855 


6,a8a 


5,944 


2,163 


i,68o* 


1,070* 


237 


1856 


7,026 


6,601 


2,324 


1,730* 


1,520* 


281 


1857 


7,279 


6,848 


2,353 


1,840* 


1,620* 


292 


1858 


6,788 


6,309 


2,3" 


1,782 


1,159 


292 


1859 


6,548 


5,987 


2,412 


1,790 


668 


320 


i860 


7,109 


6,570 


2,446 


2,036 


960 


331 


1861 


7,028 


6,514 


2,331 


1,943 


1,092 


340 - 


1863 


6,986 


6,412 


2.367 


1,833 


1,036 


351 


1863 


7,o6q 


6,390 


2,403 


1,715 


1,057 


365 


1864 


7,021 


6,306 


2,323 


1,821 


908 


381 


186s 


7,031 


6,291 


2,257 


1.956 


796 


410 


1866 


6.781 


6,036 


2,128 


1,979 


639 


425 


1867 


6,943 


6,156 


2,230 


3,067 


570 


447 


1868 


6,960 


6,204 


2,265 


2,016 


618 


463 


1869 


7,259 


6,422 


2,242 


3,046 


862 


466 


1870 


7,543 


6,708 


2,153 


2,176 


1,004 


477 


1871 


6,994 


6,106 


2,019 


2,279 


635 


527 


187a 


7,471 


6,484 


2,033 


2,333 


908 


543 


1873 


7,661 


6,660 


2,103 


2,578 


750 


583 


1874 


7.734 


6,608 


2,034 


2,717 


569 


700 


1875 


7,492 


6,397 


1,929 


2,739 


431 


679 


1876 


7,713 


6,525 


2,002 


2,763 


411 


719 


1877 


7,857 


6,636 


1,992 


2,774 


528 


730 


1878 


7,774 


6,610 


1,997 


2,746 


582 


746 


1879 


8,115 


6,899 


2,032 


3,740 


871 


757 


1880 


7,934 


6,695 


1,933 


2,530 


923 


777 


1881 


8,187 


6,895 


1,918 


2,530 


1,065 


830 


i88a 


8,396 


7,058 


1,929 


3,724 


994 


863 


1883 


8,739 


7,313 


1,966 


2,693 


1,190 


901 


1884 


8,616 


7,187 


1,970 


2,695 


1,072 


947 


1885 


8,799 


7,380 


3,033 


2,660 


1,200 


966 


1886 


8,958 


7,493 


1,983 


2,546 


1,516 


989 


1887 


9.077 


7,611 


2,015 


2.525 


1.590 


1,028 


1888 


8,980 


7.566 


1.963 


2,562 


1.444 


1,060 


1889 


8.847 


7.360 


2,007 


2,560 


1,270 


1,118 


1890 


8,930 


7.341 


2,042 


2,416 


1,277 


1,177 


1891 


8,949 


7.358 


1,948 


2.479 


1.325 


. 1,226 


1892 


9.099 


7.534 


1.974 


2,561 


1,381 


1,263 


1893 


9,040 


7,480 


1.971 


2,536 


1,347 


1,288 


1894 


9."3 


7.543 


1.971 


2,520 


1,520 


1.301 


1895 


9,468 


7.86s 


2,011 


2,605 


1,560 


1,334 


1896 


10,197 


8,512 


2,076 


2,680 


1,610 


1,422 


1897 


10,395 


8.597 


2,125 


2,746 


1,665 


1,477 


1898 


10,661 


8,855 


2,180 


2,830 


1,725 


1,518 


1899 


10,834 


8.945 


2,085 


2,920 


1,800 


i.:586 


1900 


11,984 


9.963 


2,380 


3.210 


1.875 


1,665 


1901 


13.038 


10,956 


2,626 


3.310 


2,692 


1,725 


1902 


14.300 


12,189 


3.099 


3.160 


3.480 


'•779 


1903 


15.155 


12,993 


3.443 


3.210 


3.880 


1.838 


1904 


14.155 


".935 


3.385 


3.155 


3.080 


1.915 


1905 


14.337 


12,053 


3.573 


3.075 


3.^25 


1.993 


1906 


14.398 


11,987 


3.447 


3.023 


3.13s . 


2,iOJ 



* These figures caiinot be given accurately within ;Cxoo,ooa 



HISTORICAL t)IAGRAMS. l6l 

and 1870, and 1900-02. The excise line shows stationariness till 
1870, a sudden jump to 1874, and a very slow growth since that 
date. Customs, on the other hand, have to some extent taken an 
opposite course to that of excise, so that the total from the two 
had not changed very rapidly prior to 1900. There is a very 
marked stationariness since 1871 to 1899, but by 1903 the 
receipts had risen 65 per cent. At the top of the page a new 
base line is taken, and the number of pounds per head of the 
population is shown year by year; it will be seen that the 
only important increases were between 185 1 and 1857, and from 
1898 to 1903. 

So far we have found no more difficulty in the choice of 
scales than previously when dealing with only one line, for all 
Choice of the lines on the larger diagram indicate millions 
seoond scale, of pounds, and when the unit is £1, a new base 
line has been adopted. But we may need to show the change 
of population on the larger diagram. It is necessary, as 
we have already seen, to use the same base line for the two 
quantities to be compared ; but we may choose any point for 
the beginning of the new line, adapting our vertical scale, for the 
eye can judge the proportionate changes wherever the line is 
placed. It is best to decide this point by defining the problem 
on which the comparison should throw light. If it is required to 
compare the growth of revenue with the growth of population 
since, say, 1850, we should start the new line at the point on 
the 1850 line where the revenue curve begins, and we can then 
see how the lines intersect one another again and again. Since 
1850, however, is an arbitrary date, this plan lacks definition, 
and it is more logical to make the lines coincide at the most 
recent date given, with which any previous date can then be 
compared. The plan adopted on the diagram given is another 
alternative ; the line is drawn on such a scale that it lies fairly 
close to that for inland revenue throughout the greater part of 
its course. 

The next diagram, facing p. 164, introduces further diffi- 
culties as to the choice of scales. The object of the figure is to 
oompariBon of show the relations between quantity, value, and 
quantity and price of imported wheat, and population. The line 
^ ^* A is first drawn on a scale chosen so as to throw its 
fluctuations into relief Population is at once brought into rela- 
tion with this by calculating the amount per head year by year. 



1 62 ELEMENtS OF STATISTICS. 

The line C to represent these figures is drawn on a different 
scale, chosen so that the line shall not cause confusion by con- 
tinually crossing any of the others on the figure. If the figure 
was too full this could be treated as on p. i6i, the revenue per 
head. The same scale of years must be used, and for simplicity 
of calculation and appearance, lOO lbs. consumed per head is 

Dotaiit or measured by the same vertical distance as 10,000,000 
"^""''^**'*'^ cwt imported. A and C refer to the same quan- 
tities, and therefore similar lines are used in both cases. The 
line B represents value and is shown by a broken line. For 
this line the choice of scale is more difficult. In the diagrams 
which follow, instances will be shown where special methods are 
used to bring out specific comparisons. Here this is not neces- 
sary, and a scale is adopted which brings the lines A and B into 
near relation, and shows the fluctuations of B, while the figure is 
made simple and intelligible by the representation of ;£'20 by the 
same vertical distance as 20 cwt. 

The line D shows the changing price of wheat as deduced 
from columns A and B. The scale is chosen so that it boldly 
crosses the lines A and B ; thus its fluctuations are clearly shown, 
and the numbers are easily seen, for 2s. per cwt. is represented 
by the same vertical line as 10,000,000 cwt. If the figure was 
accurately drawn, lines A and D would lie one over the other in 
1876-77 ; they are therefore shifted very slightly horizontally, 
and clearness is preserved without the general impression being 
vitiated. 

This line B illustrates some very interesting facts. Its chief 
characteristic is excessive fluctuation ; while a smoothed line 
Htetortoaifaota would show an upward tendency till 1878 and a 
miutrated. f^n since that date. The fluctuations are the 
result of a great number of causes : an increasing population, the 
fact that wheat imported is only complementary to the home 
product, which is dominated by the English weather, the varia- 
tion of harvests all over the world, political events, the fall in the 
value of silver, the development of means of communication and 
transport, and all the other causes which determine price. Notice 
how the effects of all these can be traced. A deficient home 
harvest synchronized with the rise in 1871-73, a world-wide defi- 
ciency with the fall in 1880, a good home product with the fall in 
1875. The separation of A from C shows the increase of popu- 
lation. The American Civil War is marked in 1865, the general 
improvement in transport by the rise before 1875, ^^^ f*U of 



IMPORTATION OF WHEAT AND WHEAT FLOUR, 1862-1906. 



million 
£orcwt. 

110 



X — X--X Price: shillings percwt., 

— . . Value: millions of X B.B. 

. — „.—. Quantity: millions of c\ivt.,,..A.A. 
Quantity: lbs. per head C.C. 




Scalt. 30-t 
of lbs. » 
250 



200 



Years 1862 65 70 



To face paqe 162. 



HISTORICAL DIAGRAMS. 



163 



prices by the fall since 1878, while a reaction is seen since 1895. 
These various causes, however, often tend to neutralise one another. 

Importations of Wheat and Wheat Flour, 1862 to 1906. 





A. 


B. 


C. 


D. 


Year. 


Total Quanti- 
ties Imported. 
Unit, 


Toua Value 

Imported. 

Unit. 


Quantity retained 

per Head of the 

Population. 


Average Price of 

Wheat and Wheat 

Flour in Shillings 




ZOOtUUO cwt. 


;£xoo,ooa 


per cwt 


1862 


500 


286 


185 lb3. 


H.44 


1863 


309 


155 


112 „ 


10.03 


1864 


288 


135 


104 „ 


9-37 


1865 


258 


124 


93 *» 


9.61 


1866 . 


294 


168 


104 „ 


".43 


1867 


391 


28s 


140 „ 


14.58 


1868 


365 


249 


130 „ 


13.64 


1869 


444 


233 


156 M 


10.50 


1870 


369 


196 


123 n 


10.62 


1871 


444 


268 


151 » 


12.07 


1872 


476 


303 


163 „ 


12.73 


1873 


S16 


344 


171 ,* 


13.33 


1874 


493 


309 


162 „ 


12.53 


1875 


S9S 


324 


197 „ 


10.89 


1876 


519 


279 


168 „ 


10.75 


1877 


63s 


407 


202 „ 


12.82 


1878 


597 


342 


187 „ 


11.46 


1879 


730 


400 


228 „ 


10.9s 


1880 


685 


393 


209 „ 


11.47 


1881 


713 


407 


217 »> 


11.42 


Z882 


808 


449 


242 n 


ii.ii 


1883 


851 


438 


252 „ 


10.30 


1884 


669 


301 


192 „ 


9.00 


1885 


823 


337 


238 ,, 


8.19 


1886 


670 


261 


188 „ 


7-79 


1887 


802 


314 


224 „ 


7.82 


1888 


804 


31S 


223 „ 


7.82 


Z889 


789 


311 


219 *> 


7.88 


1890 


824 


327 


226 „ 


7.94 


1891 


895 


396 


244 » 


8.85 


1892 


956 


371 


243 n 


7.76 


1893 


938 


308 


247 *, 


6.57 


1894 


967 


268 


256 „ 


5-54 


1895 


1.073 


302 


284 „ 


5.63 


1896 


996 


309 


256 „ 


6.21 


1897 


887 


330 


227 „ 


7-44 


1898 


944 


377 


237 ,* 


7.99 


1899 


985 


330 


243 n 


6.71 


1900 


986 


334 


244 >, 


6.78 


1901 


1,011 


334 


247 „ 


6.60 


1902 


1,079 


360 


267 „ 


6.67 


1903 


1,167 


397 


287 „ 


6.80 


1904 


1,182 


415 


294 n 


7.02 


190S 


1,142 


413 


283 „ 


7.23 


1906* 


1,120 


395 


280 „ 


7.10 



* Figures for 1906 are approximate. 

As regards the choice of markings for different lines, the 
chief rule is that lines which cross one another, unless very 
Ghoioe or acutely, must be marked differently. The second 
maridngg. rule is to mark similar quantities in similar ways. 
Thus in the next diagram the lines representing quantities 
have a resemblance to one another, as have also those showing 
values ; while the two lines relating to imports are distinguished 



l64 elements of statistics. 

The Cotton Trade, 1854-1906. 



Kkc Goodi Eiponed. Ra« Couon Imponed. 



HISTORY OF THE COTTON TRADE, 1854-1906. 




HISTORICAL DIAGRAMS. 165 

from those relating to exports. If it is possible to use more 
than one colour this principle can be easily carried out.* 

This diagram is intended to show the relations between the 
quantities and values of cotton imported and exported during 
The history of forty years. The vertical scale for values is chosen 
the cotton trade, go as to bring the whole figure to a convenient size 
and to mark the fluctuations. The value of the raw cotton im- 
ported is increased, perhaps trebled by manufacture, and of the 
finished product a large part is used at home, the rest exported. 
The excess of the value of exports over imports therefore re- 
presents the increment of value due to manufacture (that is the 
total earnings or the wages and profits of the cotton industry), 
less the total value of all cotton goods sold at home. When the 
exports are less in value than the imports, the earnings of manu- 
facture are less than the home consumption : when equal, equal. 
Looking at the diagram, it will be seen that value of exports 
of piece goods exceeded that of imports of raw cotton from 1854 
to i860, though often by only a small margin ; that the reverse 
was the case during the cotton famine in 1861-66, when extrava- 
gant prices were paid for raw cotton to partially supply the 
home market at a high price, while the export fell off. Equality 
was again attained in 1867, while since 1871 exports have 
greatly exceeded imports in value, the difference being perma- 
nently established since 1879. It would appear that the home 
market is saturated, while the foreign market has extended. 

The line representing the value exported may be described 
in a few words : a general and rapid increase took place from 
1850 to 1866, interrupted only by the Civil War ; since 1866 the 
fluctuations have been violent, but the general average fell till 
1898. The effect of the Civil War is well emphasised by all the 
lines here, and is clear also in the diagram facing p. 164. With a 
little experience in the use of diagrams these lines may be 
smoothed by the eye alone. 

The unit of the quantity of imports is 1,000,000 cwt. of raw 

cotton ; one-tenth of this can be distinguished in the figure. In 

ohoioe of scale 1854, 7,993»ooo cwt. were imported, and their value 

for quantities, ^as ;^20,i7S,ooo. If we represent 2 cwt. by the 

same vertical length as £i, as done in the figure, the lines begin 

♦ See IVa^es in the Nineteenth Century^ by the present author, diagram 
p. 90. 



1 66 ELEMENTS OF STATISTICS. 

at practically the same point. Adopting this scale, we are able 
to see at once the divergence of quantity from value during the 
period. 

In the year 1891, 17,811,000, cwt valued at ;^46,o8i,ooo, were 
imported. The sum would have bought 18,096,000 cwt at the 
price of 1854, a difference of only ij per cent; so that it hap- 
pens in this case that the value and quantity lines are nearly 
together again in 1891. The actual course of prices is shown 
by the lowest line on the diagram. In 1862 quantity falls more 
quickly than value as price rises, and as the supply recovered 
in 1866 value went up before and more violently than quantity, 
owing to the high price. In 1869 quantity rose while value fell, 
but otherwise the lines fluctuate together and continually tend 
towards each other. 

The study of quantity and value in exports is more inter- 
esting. It is not obvious what commodity is the best repre- 

Hiitoryof sentative of cotton exports. In 1895, 5,000,000,000 

•sports- yards of piece goods valued at ;^46,70o,ooo, 8 1 2,000 
pairs of stockings valued at ;£'220,ooo, 23,800,000 lbs. of sewing 
thread valued at ;£'3,ooo,ooo, and 250,000,000 lbs. of cotton yam 
valued at ;^9,200,ooo, were exported. A good plan, perhaps, 
would be to take so many yards of piece goods as equivalent to 
so many pounds of yarn, the relative prices being the criterion, 
and to add these together to determine the quantity; in the 
figure, however, piece goods only are taken. 

In this case there is no simple relation between quantity and 
value at the first date, and there is no simple method of making 
the two scales correspond. Having marked the value line on the 
squared paper in use, it was necessary to draw out a new system 
for the quantities. In 1854, 1,693,000,000 yards were valued at 

^25,055,000. Then --|^ yards corresponded to £1, i.e,^ 67,7 

yards to a unit ; and each number of yards had to be reduced to 
this scale. This is done in practice quite easily by a mechanical 
scale, by which numbers can be automatically reduced in any 
required ratio. The scale is then entered to the right-hand 
side of the figure. It is of course not easy to read the exact 
numbers off the figure, but it can be done with the help of 
a ruler. To avoid this diflSculty, the actual amounts can be 
entered on the diagram at critical places. But after all it is 
not the object of the diagram to make it possible to read the 



HISTORICAL DIAGRAMS. 1 67 

numbers ; the object is to show the relative rises and falls, and 
the steepness, and to allow comparisons of the lines. The figures 
should be taken from a table. No scale is in reality necessary, 
except for the process of drawing the lines. 

The history of quantity of exports of cotton (and of other 

textiles) is quite different from that of values. Value fluctuates 

Qnanttty and and shows no rise since i866. Quantity fluctuates, 

yaiue. but not greatly, except at the Civil War, and except 

in the '6o*s, and in '80-6 shows a general rise. The smoothed 

curve would rise throughout. 

One important cause of this difference is, that, as Sir R. Giflfen 
has pointed out, a large sum should be in reality deducted from ex- 
port values to allow for the import value of the raw cotton, before 
any conclusions are drawn as to the progress of British manu- 
facture. Now, as we have seen, the price of raw cotton has 
fallen very fast during precisely the period (since 1865) that 
export value has not grown. A greater corresponding deduction 
should be made in the earlier years than in the later, which would 
result in a definite, if fluctuating, rise in the period. This would 
not make values increase so fast as quantities ; the difference is 
due to the general causes of the fall of price of manufactured goods. 
By looking carefully at the diagram it will be seen that the quantity 
line approached that of value, when the price was falling in 1866-68, 
and fell away again with the higher price of 1872; after 1872 
the quantity line gets nearer again, and crosses the value line in 
1875, when the price was the same as in 1850; from 1875 to 
1898, as prices fell, the divergence steadily increased. 

It is too early to analyse the very rapid increase in price, 
values, and quantity imported since 1899, and the fluctuating 
rise in quantity exported. Smoothed lines drawn on the diagram 
would show a very marked improvement over previous decades. 

It must be admitted that a study of the diagram repre- 
senting these figures leads much more rapidly and safely to 
many interesting conclusions than the table on p. 164 of the 
figures themselves. 



l68 ELEMENTS OF STATISTICS. 



3. Comparisons of Series of Figures. 

A. Before proceeding to the study of the next diagram, it will 
be well to define more exactly what is our object in comparative 
studies of figures, and to consider the means at our disposal. 

When dealing with two series of similar quantities such as 
the course of trade or population in two countries, we wish to 

QiuBiiUiB see the general rate of progress (to be done by 

oomparisottf. smoothing the curve), the years of special increase, 
the dates of maximum and minimum, in fact to compare the three 
things that the eye can see — the increase, the rate of increase, 
and the dates of change of rate of increase. The most obvious 
way to do this is, to take the same scale and base line for both 
countries and the same unit of measurement ; but this method 
does not take us all the way. We can judge differences, it is 
true, and the additions in all the years in both countries, and we 
can see the highest and lowest points and dates of change of 
rate of increase ; but we cannot compare rates of increase. 
It is not easy to judge ratio, though a rough guess at it is 
possible. Thus if the trade is very different in magnitude in 
the two countries, equal absolute increments will mean very 
different relative increments, and it is diflScult to be always on 
one's guard. 

The remedy for this is to alter the arrangement of scales. 
Make a second figure, in which the unit shall be not a sum of 

Poroontago money, but a percentage : let i per cent, of Eng- 
soaies. land's trade, say in 1850, be the unit for the 
English line ; and i per cent, of the trade of Germany, at the same 
date, for the German line. In other words, express the trade of 
both countries as percentages of their value in a given year, 
and draw lines to represent these percentages. Alongside the 
diagram two or more scales can be placed showing the absolute 
amounts of the trade of each country. Then the rates of 
increase will be comparable, equal increments representing equal 
percentages of the trade of each country ; and, in addition, the 
dates at which either country gained ground relatively to the 
other can be easily picked out. The question as tp whether 



COMPARISONS OF SERIES OF FIGURES. 



169 



absolute rates or relative rates should be studied is a very com- 
mon one in statistics. Sometimes the absolute magnitude 
Alnolnta or 



should be known, as for instance when we want to 
estimate the effect of measures which will affect 
'**'^'*' the well-being of special classes, or the trade of 
special countries ; sometimes the relative rate, as when we want 
to watch the progressive increase of different industries, or to be 
on our guard as to future competitors. The two studies gene- 
rally require two different diagrams though they may represent 
the same numbers. 



It will be seen that the chief difficulty lies in the choice of 
the year in which the quantities are to be equated ; this must 
be decided by the nature of the argument which the diagram 
is to illustrate. 



We may compare the following figures— 



Year - 


1880 


.890 


1900 


A ■ 


2ZO 


440 


330 


B . 


16a 


240 


400 



in three ways, thus :— 

. Expressed as percentages of values 



. Expressed as percent^es of values 



aoo 440 320 



ISO 330 240 



ELEMENTS OF STATISTICS. 

3. Expressed as percentages 
of values in tgoo. 
Scales 
7, A. B. 



100 330 400 
SO 165 200 



In figure 3 the fluctuations are seen as percentages of the 
values at the last date, and are thrown into better proportion 
than in figure i. It is frequently the case that the equating of 
quantities at the most recent date throws what are often small 
lieginnings into their right proportion when viewed from the 
modern standpoint. The statements that the values in 1880 
were 40 and 6^ per cent, respectively of the corresponding 
present values, is in better perspective than the statement that 
the values in 1900 were 250 per cent, and 1 50 per cent, of the 
corresponding values in 1880; but circumstances must decide 
in each case which method is to be adopted. 

These points ar« fully illustrated by the annexed diagrams, 

the object of which is to analyse the progress of our trade with 

nmittutioii ^^^ colonies and with foreign countries, especially 

bom trade with Germany. The first figure shows the total im- 

'**''°"^' ports and exports, and the parts of each which 
are colonial and foreign, the scale in millions of pounds being 
the same for all the lines. A line is also given for imports from 
Germany, Holland, and Belgium ; these are grouped together, 
because it is not possible to distinguish in the returns from the 
two latter home manufactures from German goods in transit 
It is not clear from this diagram which part of our imports has 
increased most rapidly. The three lines are, therefore, redrawn 
in the second diagram, on a percentage scale, all the values 
being expressed as percentages of the corresponding values in 



COMPARISONS OF SERIES OF FIGURES. 



171 



1905. It IS now seen that imports from foreign countries 
and from our colonial possessions and India have marched 
together except during the period of the cotton famine, but the 

Imports and Exports, 1862-1905. 

Unit in all columns^ ;£'ioo,ooo. 







Total 


Exports 


Exports 


Imports 


Imports 


Imports 




Total 


Exports 


to 


to 


from 


from 


from 

Germany, 

Holland and 




Imports. 


including 


British 


Foreign 


British 


Forei^ 


1862 




Re-exports. 


Possessions. 


Countries. 


Possessions. 


Countries. 


Belgium. 


2,257 


1,662 


454 


1,207 


653 


1,604 


279 


1863 


2,489 


1,969 


550 


1,419 


847 


1,642 


283 


1864 


2,749 


2,126 


557 


1,569 


937 


1,812 


332 


186s 


2,711 


2,188 


515 


1,673 


728 


1,982 


364 


1866 


2,953 


2,389 


572 


1,817 


722 


2,231 


388 


1867 


2,752 


2,258 


534 


1,724 


607 


2,144 


373 


1868 


2,947 


2,278 


537 


1,741 


670 


2,277 


379 


1869 


2,955 


2,370 


519 


1,851 


704 


2,250 


405 


1870 


3,033 


2,441 


554 


1,887 


648 


2,384 


409 


I87I 


3,310 


2,836 


556 


2,280 


729 


2,581 


469 


1872 


3,547 


3,146 


656 


2,490 


794 


2,753 


455 


1873 


3,713 


3,"o 


711 


2,399 


810 


2,903 


463 


1874 


3.701 


2,977 


779 


2,197 


822 


2,879 


494 


187s 


3,739 


2,816 


767 


2,050 


844 


2,895 


515 


1876 


3,752 


2,568 


701 


1,866 


843 


2,908 


516 


1877 


3,944 


2,523 


758 


1,766 


896 


3,049 


590 


1878 


3,688 


2,455 


720 


1,735 


779 


2,908 


575 


1879 


3,630 


2,488 


665 


1,823 


789 


2,840 


543 


1880 


4,112 


2,864 


815 


2,049 


925 


3,187 


616 


I88I 


3,970 


2,971 


867 


2,104 


915 


3,055 


582 


1882 


4,130 


3,067 


923 


2,143 


994 


3,136 


658 


1883 


4,269 


3,054 


904 


2,150 


987 


3,282 


692 


1884 


3,900 


2,960 


883 


2,077 


958 


2,942 


646 


1885 


3,710 


2,715 


885 


1,860 


844 


2,866 


638 


1886 


3,499 


2,690 


822 


1,867 


819 


2,680 


609 


1887 


3,622 


2,813 


823 


1,990 


838 


2,784 


646 


1888 


3,876 


2,986 


917 


2,068 


869 


3,007 


684 


1889 


4,276 


3,156 


908 


2,248 


973 


3,304 


715 


1890 


4,207 


3,283 


945 


2,337 


962 


3,245 


694 


I89I 


4.354 


3,091 


933 


2,158 


995 


3,360 


716 


1892 


4,238 


2,916 


812 


2,104 


979 


3,259 


715 


1893 


4,047 


2,771 


786 


1,986 


919 


3,128 


720 


1894 


4,083 


2,73^ 


786 


1,952 


940 


3,143 


716 


189s 


4,167 


2,858 


761 


2,098 


957 


3,210 


729 


1896 


4,418 


2,964 


907 


2,057 


933 


3,485 


761 


1897 


4,510 


2,941 


871 


2,071 


941 


3,569 


760 


1898 


4,705 


2,940 


901 


2,038 


998 


3,708 


786 


1899 


4,850 


3,295 


943 


2,352 


1,069 


3,781 


834 


1900 


5,231 


0, JT^ 


1,021 


2,523 


1,096 


4,134 


861 


I90I 


5,220 


3,479 


1,132 


2,347 


1,057 


4,163 


897 


1902 


5.284 


3,492 


1,176 


2,317 


1,069 


4,215 


950 


1903 


5,426 


3,604 


1,195 


2,409 


1,137 


4,289 


973 


1904 


5,510 


3,710 


1,208 


2,502 


1,200 


4,310 


962 


1905 


5,650 


4,076 


1,227 


2,849 


1,279 


4,372 


990 



trade from Germany, &c., has increased more rapidly than either. 
If we had equated the quantities in 1862, the German line would 
have far outpassed the others by 1905 ; but the impression given 
would be erroneous as regards absolute quantities, for the 



172 ELEMENTS OF STATISTICS. 

increase was only ;^7i,ioo,ocx) for the one, while it was 
;f 277,000,000 for all foreign countries. The remaining diagram 
shows the relative rates of increase for Germany, Holland and 
Belgium, and the British possessions respectively, since 1870. 

B. Series of figures are often compared graphically with a 
view to discovering or illustrating causal relations. In such 

oauffti cases we do not only study relative growth as 

nutioiii. jn the last diagram discussed, but look throughout 
the period for any signs of resemblance in rates of growth, dates 
of maxima and minima, or S5mchronism in any changes. The 
methods by which such comparisons are made are difficult, and 
need careful analysis. For instance, we may wish to show that 
an increase of the allowance for outdoor relief is connected with 
an increase of pauperism. In this case one line will represent 
money, the other the number of persons, and there is no common 
unit ; we need not calculate percentages, but having chosen any 
scale for money, we can make equality in any year by a simple 
adaptation of the scale for number. We shall wish to establish 
first, that an increase or decrease of money occurred at, or just 
before, an increase or decrease in number ; and secondly, that 
the greater the increase of one the greater the increase of th^ 
other. In order to show direct connection, we shall try to make 
one line lie as nearly as possible over the other. 

Draw a preliminary diagram in which both lines are entered 
on any scales ; this will suggest the resemblances to be tested. 

Notice in what period the fluctuations are greatest ; 
this in general should be the period to be taken, 
for it is here that the causal relations have had most play. 
If any other period is chosen for any special reasons, these 
should be made clear, for otherwise a critic may legitimately 
object that it is only in this period that the connection is 
distinct. There would be little difficulty in finding short 
periods in any two curves where the fluctuations synchronised. 
Take the averages of both money and of number over the 
period chosen, and draw a second diagram in which the scale 
for number is chosen by making this average for number equal 
to the corresponding average for money. Any correspondence 
between the two lines can be at once detected. 

The process just described is completely carried out in the 
first two diagranns comparing the marriage rate and foreign trade 
facing p. 175. 



TS as PERCENTAGES of THEIR TOTAL VALUES in 1905. 

From British Possessions ^ — „ — « — m — m — ^ — » 
From Foreign Countries 



FiGURl 



« » 



« « « 



From Germany, Holland & Belgium 



Scale for 

Foreign 

Countries 




6 8 70 2468 80 2 468 90 24 68 1900 2 4 5 



^ from British Possessions ,^_ 
nany, Holland & Belgium 

U£S CHOSEM FOR EQUAUTr IN fS70. 



W M 




1 



H • 1- 



/ 



/ 



/ 



-I — I — I — h 



468 80 2468 90 246 8 1900 2 4 5 



H h 



Scale for 
Germany 

min. 
.. +100 



■90 

-80 
-70 
-60 

-50 

-40 

-30 

-20 
■10 



H l-H- 



To face page iy2. 



I 



I 



I 



COMPARISONS OF SERIES OF FIGURES. 1 73 

There are many cases when the changes in the magnitudes 
which we regard as the causes are inversely proportional — in 

Inverse the Opposite sense — to those in the magnitudes 

reiauons. which we regard as the effects. For instance, if 
we are comparing trade improvement with the number of 
unemployed, and make the construction just described, the 
maxima of the first line would synchronise with the minima 
of the second. Greater clearness can be obtained by inverting 
one of the diagrams, plotting out the number employed instead 
of that unemployed, and then the changes should be in the 
same sense in both lines. 

In the above construction the lines will only lie one over the 

other throughout their fluctuations, if the changes in one quantity 

Moreoompiex are in strict proportion to the changes in the other, 

relations. jf j^jj increase of 10 per cent, for instance, in the 
allowance for outdoor relief corresponded to one of 10 per cent, 
in the number of paupers. It is very rare that such a simple 
relation is found ; all we can see in general is that the maxima 
and minima occur at the same dates, that the fluctuations agree 
throughout in sense in both series, and that the greater fluctua- 
tions in the one correspond to the greater fluctuations in the 
other. 

Diagrams may often be used to suggest correlation between 

two series of figures, and this indeed is one of their chief merits. 

Use of and they may be used to illustrate arguments on 

diagrams. ^^ subject, but at this point their utility ends, for 
they cannot be made to prove much. Causal relations are very 
difficult to establish, and the original figures must be critically 
consulted when theories are to be brought to tfie test. 

We have not yet exhausted the power of diagrams for 

making such comparisons, but the following method must be 

More exact applied only with great caution. Suppose that we 

method. ^jgh |.q ascertain whether an increase of i bushel in 
the quantity of wheat to be bought for a sovereign corresponded 
to an increase of 1.5 in the marriage rate per 1,000, or any 
such strict numerical proportion. Draw a diagram representing 
the quantities of wheat, take the average for the period chosen 
for comparison, and write the scale so as to read i, 2, 3 . . . 
bushels above or below the average. Draw no base line. Now 
enter a line to represent the excess or defect of the marriage 
rate from its average in the chosen period, on a scale such that 



h 



174 



ELEMENTS OF STATISTICS. 



1. 5 in excess is represented by the same vertical distance as 
I bushel. The closeness of the two lines would test to what 
extent the theory was valid. The danger of this method is, that 
with no base line there is no possibility of judging the amounts 
of the changes relative to the totals. The insertion of the 
necessary two base lines would confuse rather than aid. 



Marriage Rate, Total Exports and Imports per Head of Popu 

LATION, AND AVERAGE PrICE OF WhEAT PER QUARTER. 



1 



Year. 


Marriage 
Rate 


Total Exports 
and Imports 


Average Price 
of Wheat 






per Head. 


per Quarter. 






£. s. d. 


s. d. 


i860 


17. 1 


13 8 


S3 3 


1861 


16.3 


13 3 


55 4 


1862 


16. 1 


13 8 


55 5 


1863 


16.8 


IS 2 7 


44 9 


1864 


17.2 


16 8 7 


40 2 


1865 


I7.S 


16 7 5 


41 10 


1866 


17.5 


17 14 5 


49 " 


1867 


16.5 


16 9 6 


64 5 


1868 


16. 1 


17 6 


63 9 


1869 


15.9 


17 3 9 


48 2 


1870 


16. 1 


17 10 3 


46 10 


187 1 


16.7 


19 9 6 


56 8 


1872 


17.4 


21 


57 


»873 


17.6 


21 4 2 


58 8 


1874 


17.0 


20 II 


55 8 


'?75 


16.7 


19 19 4 


45 2 


1876 


16.5 


19 10 


46 2 


1877 


15.7 


19 5 5 


56 9 


1878 


15-2 


18 2 I 


46 5 


1879 


14.4 


17 16 10 


43 10 


1880 


14.9 


20 3 3 


44 4 


1881 


15. 1 


19 17 5 


45 4 


1882 


15.5 


20 8 10 


45 I 


1883 


15.5 


20 13 2 


41 7 


1884 


15.1 


19 4 I 


35 8 


1885 


14.5 


17 16 9 


32 10 


1886 


14.2 


17 10 


31 


1887 


14,4 


18 II 7 


32 6 


1888 


14.4 


18 12 I 


31 10 


1889 


150 


19 19 9 


29 9 


1890 


15.5 


19 19 7 


31 II 


1891 


15-6 


19 14 


37 


1892 


15-4 


18 15 6 


30 3 


1893 


14.7 


17 14 9 


26 4 


1894 


15.1 


17 II 9 


22 10 


1895 


15-0 


17 19 3 


23 I 


1896 


15.8 


18 14 I 


26 2 



^ • • c c 

• • * 



• • • • 

• • 
•• • 

• •• • 



r 



COMPARISONS OF SERIES OF FIGURES. I75 

It is clear from the preceding analysis that, by the choice 
of scales and base lines, the points at any two dates may be 
made to coincide on any number of accurately drawn lines 
representing series of figures. 

The preceding paragraphs are completely illustrated by the 
adjoining diagram. 

On the left are given lines representing the price of wheat in 

shillings per quarter, the total of values of exports and imports 

niostrauonof divided by the population, and the marriage rate 

method. pej- i^oGQ. The scales chosen are simply those 
which are easiest to use, and throw the lines into proper relief. 
The points in each scale for the same years are over one another, 
but the scales differ. The base lines need not coincide. 

We can see at a glance whether there is resemblance between 

the courses of these figures. There is at any rate a general 

Marriage rate correspondence between the fluctuations of trade 

and trade, and of the marriage rate since 1870, and possibly 
earlier. There are points of likeness between wheat prices 
and trade; in 1870-73 both rise together, and fall in 1873-75; 
both rise in 1876-77, fall in the following two years, and then 
rise again; both fall from 1881 to 1886 and then rise. There 
are also many cases in which the motions do not agree, especially 
1862-64, and 1887-89. 

If we look now at the price of wheat and the marriage rate, 

which in the earlier part of the century used to be closely 

Marriage rate related, the one rising when the other fell, we see 

and wheat ^j^^t there is no great resemblance either in this 
or the contrary sense. In 1860-62 and in 1862-64 wheat rose 
and fell, while the marriage rate fell and rose ; wheat rose in 
1865-67, while the marriage rate was first stationary and then 
fell a little ; then it continued to fall in 1868-70, though wheat was 
falling also ; in 1870-80 the marriage rate shows one long, wheat 
two short, fluctuations. Since 1880, in years in which wheat 
fell, the marriage rate in general fell also and vice versa. 

Let us consider for a moment the possible links of connec- 
tion between these phenomena. When wheat was the chief 

oonneoting object of expenditure of the working class, its 

"^*^ price was the chief thing for them to consider; 

and so when wheat rose the marriage rate fell. On the other 

hand, now that wheat is cheap and wages higher, a change in 

the price of the loaf is only of great importance to a minority ; 



176 ELEMENTS OF STATISTICS. 

it is now the general prosperity of the country, well indicated by 
the condition of foreign trade, that raises the marriage rate. 

When exports and imports are increasing in value, trade is 
stimulated, and in spite of rising prices, marriageable people are 
sanguine that the prosperity will remain and the prices fall ; but 
when the prices fall, so do the profits and incomes, and marriage- 
able people are more prudent For these reasons we may expect 
the marriage rate and foreign trade lines to resemble each other. 

Now the increase of the marriage rate corresponding to an 
inflation of trade, and an inflation of trade to a time of rising 
prices in general, we shall find the price of wheat in particular, 
which is connected with the course of prices in general, rising 
when trade is inflated and falling when it is depressed, and 
therefore rising and falling with the marriage rate. But since 
the price of wheat is influenced also by special causes, it will not 
always correspond to the state of trade, and still less to the 
marriage rate, with its former tendency to opposite variations. 

There is no need then for surprise that the curves marriage 
rate and trade correspond ; that wheat and trade correspond, 
but less closely ; and that wheat and marriage show a double 
tendency. The correspondence between marriage and trade is 
investigated on the diagram. That between wheat and trade 
should be done on an identical method. Marriage and wheat 
should be compared twice on different plans: first for direct 
correspondence, and then by redrawing the wheat curve with its 
base line at the top for inverse correspondence. 

To effect the comparison between the course of trade and 

the marriage rate, the following steps are taken. On examining 

oonstmotion of the two curves on the first figure, it is seen that 

diagram. the resemblance does not begin before 1869; 
the parts of the curves since 1869 should therefore be brought 
into close correspondence. The average marriage rate, 1869-94, 
is 15.5, and average imports and exports per head, ;^I9. The 
marriage curve is drawn in the ordinary way ; then with the 
help of a sliding scale the trade curve is put in, so that with 
the same base line ;^I9 falls on the 15.5 line. 

The result is that the curves are seen to rise and fall at the 
same dates, but not to the same extent; for, while the lines 
keep nearly parallel from 1873 to 1879, the falls from the 
maximum being equal, after 1879 the trade line fluctuates further 
above and below its average than the marriage rate does. 



COMPARISONS OF SERIES OF FIGURES. 



177 



It remains to test graphically whether the changes are propor- 
pinai tional to one another. An equation of scales may 
oomparuon. \^ obtained in the following manner: — 



1869 

1873 
1879 

1882-3 

1886 

1891 

1893 



Marriage Ratb. 

Maadma. Minima. Differences. 
15.9 



17.6 

... 

• • • 

15.6 



14.4 

• * . 

14.2 

. • • 
14.7 



1.7 

3-2 
I.I 

M-3 

'fi.4 

}.9 



Average of differences - 1.6 



Imports and Exports per Head. 



1867 

1873 
1879 

1883 

1886 

1889 

1894 



Maxima. 

• • « 

21 4 2 

• • • 

20 13 2 

• • • 

19 19 9 



Minima. Differences. 
£ s, d. £ s. d, 

16 9 61 



17 16 10 

• • • 

17 o 10 

• • • 

17 II 9 



4 14 
3 7 



2 

3 

2 

2 



16 

12 

18 

8 



8 

4 

4 

4 
II 

o 



-;^3 6 3 



Represent £'^, 6s. 3d. on the same scale as 1.6 ; then ;^i and .5 will be 
represented by the same height. 

This is making the hypothesis that a change of ;^i in the total 
trade per head synchronises with a change of .5 in the marriage 
rate per thousand. The scales so chosen are marked above and 
below the common average line in the right-hand figure. 

It is now seen that the fluctuations since 1880 lie more 
closely together in the two curves, but that this closeness has 
been obtained by the partial sacrifice of the years 1872-80, and 
there is now a complete disagreement before 1870. A yet 
shorter period, 1879- 1893, would show a very close agreement; 
but so special a selection would vitiate any general argument. 

Our conclusion is, that since 1870 the causes which affect 
foreign trade have also affected the marriage rate at the same 
dates and in the same sense, and that the more marked the 
effects on the one, the more marked are the effects on the other 
also, but that there is no law of simple proportion between them. 

Note. — The relations tested by the middle diagram may be 
represented by the equation - ="7, and that of the right-hand 

diagram by —— i=^(a constant), where ;rand^ stand for the value 

of trade and the marriage rate, and a and b for their average 
values, and c is chosen so as to make the average fluctuations of 
the two sets of quantities equal. By the method of least squares 
c could be chosen so that the correspondence should be closer 
than with the value given by the calculation in the text. 

See note on p. 193 for further methods of making comparisons. 

M 



178 ELEMENTS OF STATISTICS. 



4. Periodic Figures. 

We now come to the consideration of periodic figures ; that 

is, of figures which within a given period, in a year for instance 

^^ M, - when returns are monthly, reach maxima and 

minima at assigned times, and show fluctuations 
recurring with regularity in successive periods. In physical 
phenomena, such as the sunrise, the same daily numbers will 
represent the phenomena, almost without change, year after 
year. In the case of the tides we find a link between the 
more rigid annual curves of seasonal phenomena, and the less 
marked periods of social statistics ; for the tides are subject to 
separate influences with periods of 24 hours, 24 hours 50 min., 
29 days, I year, and others, and the effects of these influences 
are often masked one by the other. In the weekly figures of 
the Bank of England, Jevons discovered monthly, quarterly, and 
annual periods.* 

In social and industrial statistics we usually find an annual 
period, combined with a general slow movement upwards or 
downwards, and confused by an irregular period of about ten 
years, due to alternate inflation and depression of trade. The 
influences of these three movements on the resulting numbers 
can be investigated, and the general methods of examining 
periodic figures fully explained by the complete discussion of one 
example, viz., the monthly returns of want of employment of the 
Friendly Society of Ironfounders. For another example the 
reader is referred to Jevons' essay, On the Frequent Autumnal 
Pressure in the Money Market ;* and for an exercise, to the 
monthly gazette wheat prices, where the gradual change of the 
shape of the annual diagram can be traced in relation with 
the increasing influence of harvests in all the quarters of the 
globe. 

These figures are specially suitable for showing graphically 

a double period, and the influences of rapid annual fluctuations and 

CMnerai features general movements of longer period on each other. 

of the figures. Looking at the table on p. 179 along the lines for 

the several )/ears, it will be seen that there is always a fall in the 

middle of the year. Looking down a vertical column under any 

* See Im/estigations in Currency and Finance* 



\ 



« • • 

' • • • • 



• a, 

••• 

I 
• • •• ' 






^ 



1 



PERIODIC FIGURES. 



179 



Number of Unemployed Ironfounders, expressed as percentages 
of estimated total number of members, month by month : calculated 
from figures given in the Annual Report of the Friendly Society of 
Ironfounders, 1894. 





























Aver- 


Year. 


Jan. 
II.I 


Feb. 


Mar. 
14.0 


April. 


May. 


June. 


July. 


Aug. 


Sept. 


Oct. 


Nov. 


Dec. 


age for 
Year. 


1855 


14. 1 


12.5 


10. 


9-9 


8.7 


8.7 


6.8 


7.7 


8.8 


12.0 


ZO.4 


1856 


10.9 


12.6 


12.2 


10. 


9.4 


7.5 


6.9 


7.3 


6.9 


8.1 


8.7 


9.9 


9.2 


1857 


10. 1 


9.5 


8.7 


8.7 


8.1 


7.3 


6.8 


6.9 


6.2 


8.0 


14.0 


17.7 


9.3 


1858 


20.2 


20.6 


20.9 


19.8 


20.3 


17.8 


15.9 


14.3 


13. 1 


11.9 


11.5 


II.2 


16.5 


1859 


10.6 


8.8 


6.5 


5-2 


4.0 


4.4 


3.2 


3.6 


3-4 


3.8 


4.6 


5.1 


53 


i860 


4.0 


3-2 


2.6 


2.2 


1.6 


1.7 


2.3 


2.6 


. 2.6 


2.9 


3.7 


5.6 


2.9 


1861 


6.0 


6.9 


6.5 


7.9 


7.8 


8.4 


6.9 


7.9 


9.5 


10.7 


12.4 


13.8 


8.7 


1862 


14.5 


14.0 


14.0 


14.6 


14.4 


13.7 


13.3 


12.9 


12.2 


13.5 


14.9 


16.0 


X4.0 


1863 


155 


13.9 


13.6 


II. 6 


10.4 


9.3 


8.1 


7.8 


7.4 


6.6 


5.3 


5.0 


95 


1864 


6.0 


7.1 


6.6 


5- J 


4.4 


3.3 


2.8 


2.8 


2.6 


3.3 


4.2 


8.1 


4.7 


1865 


5.4 


5.3 


5.3 


4.6 


3.4 


2.9 


2.6 


3.1 


2.7 


2.6 


2.3 


4.9 


3.8 


1866 


4.2 


5-4 


5.1 


3.6 


5.1 


6.5 


5.9 


6.5 


6.9 


7.4 


9.3 


13.8 


6.7 


1867 


12.4 


13.2 


15.4 


16.7 


14.9 


14.6 


14.2 


13.9 


15-7 


16.3 


18.9 


22.6 


IS 7 


186S 


22.1 


20.9 


19.8 


18.6 


16.7 


15.8 


14.9 


14.7 


14.2 


14. 1 


15-6 


17.4 


17. 1 


1869 


17.3 


17. 1 


16.8 


15.6 


15.2 


13.6 


13-3 


II.8 


13. 1 


13.6 


14.8 


153 


14.8 


1870 


14.5 


10.9 


8.7 


7.2 


5.0 


4.5 


H 


4.5 


4-9 


5.0 


5.6 


8.3 


6.9 


187 1 


7.2 


5.6 


3.6 


2.8 


1.6 


1.5 


1.6 


1.2 


.9 


1.4 


I.I 


2.2 


2.6 


1872 


I.I 


I.I 


.9 


.8 


1.2 


.7 


.9 


I.O 


1.3 


1.8 


2.6 


4.1 


1.5 


1873 


3-3 


2.8 


2.7 


2.5 


2.1 


2.0 


3.0 


4.9 


4.3 


3.3 


3.3 


5-1 


33 


Average 




























1855-73 


10.3 


10.2 


9.7 


8.9 


8.2 


7.7 


7.1 


72 


7.1 


7.5 


8.5 


10.4 


8.6 


1874 


4.9 


3.9 


3.9 


3-S 


4.9 


3-§ 


3.8 


3.4 


3.5 


3.7 


3-9 


5.0 


4.0 


1875 


4.6 


3.4 


3.5 


2.8 


2.8 


2.8 


3.3 


H 


3.6 


4.1 


4.1 


50 


3.6 


1876 


4.9 


4.9 


4-9 


5.4 


4.8 


5.2 


5.7 


5.8 


6.4 


6.4 


6.2 


10.3 


S9 


'!77 


7.7 


7.4 


7.0 


6.9 


8.4 


7.6 


7.4 


7.8 


9.6 


10.9 


12.3 


16.3 


9.1 


1878 


14.0 


14.3 


13.5 


15.3 


13.3 


14.6 


13.6 


13.2 


13.3 


14.0 


15.7 


21.0 


X4.7 


1879 


23.2 


23.8 


24.7 


25.5 


22.3 


23.4 


21.5 


22.6 


22.5 


21. 1 


18.0 


16.6 


22.1 


1880 


15.2 


12.9 


II.I 


10.0 


10.0 


9.7 


9.8 


10.0 


10.0 


9.2 


9.2 


10.2 


Z0.6 


1881 


".5 


10.8 


10. 1 


10. 1 


7.6 


7-5 


6.5 


5.8 


5.6 


5.4 


5.0 


6.6 


7.7 


1882 


5.5 


5-2 


5.3 


4.5 


3.6 


3.8 


3-2 


3.4 


3.6 


4.1 


4.4 


6.0 


4-4 


1883 


3.6 


4.8 


5.2 


4.3 


4.2 


3.6 


3-9 


4.3 


4.3 


4.2 


4.0 


6.6 


4-4 


1884 


6.1 


6.2 


5.9 


6.5 


6.5 


6.9 


6.5 


7.6 


8.1 


7.8 


9.8 


10.9 


7-4 


'!!5 


10.2 


II.I 


10. 


10. 1 


9.8 


9.1 


9.8 


10.7 


11.8 


II. 6 


12.7 


13.6 


10.9 


1886 


14. 1 


15.0 


15-2 


15.5 


13.4 


13. 1 


12. 1 


12.7 


136 


13.9 


12.7 


12.9 


13.7 


1887 


12.4 


II. 6 


10.2 


9.1 


9.2 


10.6 


9.2 


8.8 


9.6 


9.4 


9.4 


9-1 


9.9 


1888 


7.8 


7.5 


6.4 


6.4 


5.9 


5.2 


5.7 


5.0 


5.1 


4.8 


3.2 


3.5 


5.5 


1889 


3.1 


3.3 


2.4 


2.1 


1.7 


1.6 


1.7 


1.7 


1.6 


1.5 


1.2 


1.4 


1-9 


1890 


1.3 


1.3 


3.2 


3.1 


2.8 


2.4 


2.4 


2.7 


2.7 


2.7 


2.7 


2.7 


2.5 


1891 


3.9 


3.5 


4.2 


4.2 


4.6 


4.0 


4.5 


4.8 


5.4 


5.6 


5.7 


6.3 


4-7 


1892 


7.0 


7.2 


7.9 


8.1 


7.9 


7.9 


7.7 


7.6 


9.3 


II. 4 


10.9 


12.0 


8.7 


1893 


".5 


11.2 


10. 1 


7.7 


9.6 


8.3 


8.3 


9.2 


11.7 


11.9 


11.5 


11.5 


10.2 


Average 




























1874-93 


aa 


8.5 


8.2 


8.Z 


7.7 


7.6 


7.3 


7.5 


8.1 


8.2 


8.1 


9-4 


ai 


Average 




























1855-93 


9.4 


9.3 


8.9 


85 


7.9 


7.6 


7.2 


7.4 


7.6 


7.9 8.3 


9-9 


8.3 



l80 ELEMENTS OF STATISTICS. 

month, it will be seen that there is no generally marked ten- 
dency towards increase or diminution, for high and low numbers 
occur in the first as well as the last few years. The most notice- 
able feature of these figures is the alternation of groups of years 
of high and of low numbers. Percentages above lo will be found 
in 1 86 1 -63, 1 866-70, 1 877-8 1 , 1 884-87, and i ^^-93. Let us choose 
for examination the period 1866-70. The figure for January 
1866 is below the Januaries of previous years ; those of February, 
March, and April are also low; from May to September the figures 
are greater than those of 1865 or 1864 ; from October to Decem- 
ber they are greater than those of 1863, 1864, or 1865 ; in De- 
cember 1867 they are greater than any previous year. Most of 
the figures for 1868 beat the record up to that date; but from 
September 1868 the figure is lower than the one twelve months 
earlier till July 1872. This wave of unemployment then lasted 
from May 1866 to September 1872. 

Now let us watch the seasonal influence. In 1866 there 
was no fall in the summer except in April, and there was a very 
Seasonal rapid rise in December. In 1867 a fall in May 
influenoa and a slight fall from June to August was followed 
by a rapid rise in November and December. There is a fall 
from December 1867 to September 1868, but a rise follows in 
October, November, and December; since the rise does not 
generally begin till August, it will be seen that the general 
fall did not much delay the seasonal effect. In the next year, 
1869, there is a fall to a lower minimuni in August, but now 
the rise in December is very slight, next year the fall is very 
quick to August, but the seasonal rise is not delayed. From 
this it is clear that the seasons had their effect throughout the 
fluctuation except in the opening year 1866, when there was 
no fall, and that the rises in the autumn were very much 
accentuated. Almost identical remarks would apply to the 
period August 1875 to May 1881. In what month was the con- 
dition of employment 1867-70 at its worst? The greatest figure 
given is 22.6 per cent, in December 1867, but unemployment 
in December is generally greater than in any other month, and 
the figures for any of the following six months may be more 
unusual ; the determination of the exact date will be best shown 
by diagrams. It may be mentioned that most of these remarks 
were suggested by Mr Hey, the former secretary of the Iron- 
founders' Society, who drew up these figures. 



PERIODIC FIGURES. l8l 

If we now turn to the diagram, the following facts may be 
noticed. The thick line showing the annual average percent- 
Th6 story from ages shows a downward tendency till 1857, ^^^' 
the diagram, lowed by an abrupt rise and fall in 1858, then 
three years' rise to its original height, returning to a minimum 
in 1865 ; the next wave covers six years, and is marked by an 
extraordinarily sharp rise in 1867, and a very low minimum in 
1872. The exceptional condition of trade in 1872 could not 
last, but the rise is very gradual to 1876, when the next cycle 
of trade is marked agaia by a six years* wave : the rise is 
not so steep as in the former fluctuation, but lasts longer, and 
a higher point is reached : the fall is at about the same angle, 
and the minimum in 1882 is about the same as that in 1865. 
The next wave came before it appeared to be due, and lasted 
seven instead of six years, but was much more moderate, and 
again the rise was sharper than the fall. The minimum of 
1889 did not endure, and the figure ends with a suggestion 
that the maximum will be in 1894, but only at a moderate 
height, and the next minimum might be expected in 1898 
or 1899, if causes similar to those which influenced earlier trade 
depressions were still acting. It may be found, in fact, from 
the Board of Trade returns, that, taking all the trade unions who 
made returns together, the maximum month was December 1892, 
and the maximum year was 1893 ; after this the fall is regular 
to 1897, and a trifling rise in 1898 is followed by a very low 
figure for 1899.* 

In figure 5 the diagram is inverted and greatly compressed, 
showing now the percentage employed. If the period 1876-82 
is cut off" by two vertical lines, readers may see how great were 
the amounts of labour lost to the country and wages to the 
members of the Ironfounders* Society in those years. These 
figures show a want of employment due to special causes in this 
Society more than twice as great as in other Unions whose 
returns are available for the same period. 

In figure 5 the annual averages are smoothed by the method 
explained above (p. 152), a seven-yearly averagef being taken 



* See Annual Abstract of Labour Statistics^ 1895, p. 73, for various 
methods of treating these figures similar to those here discussed. 

t For smoothing and studying periodic curves, see Professor Poynting's 
paper in Statistical Journal^ 1884. 



l82 



ELEMENTS OF STATISTICS. 



to correspond to the general wave length. It will be seen that 
there is no very marked tendency up or down in the thirty-nine 
years, and that the smooth line is never far from the general 
average of employment, 91.7. 

The comparison of this diagram with that illustrating ex- 
ports (p. 151) is very instructive. Some of the results may 
be thus exhibited : — 





Dates 


OF 




Dates 


OF 


Minima 




Maxima of 


Maxima 




Minima of 


oi Exports. 




Unemployment. 


of Exports. 




Unemployment. 


1862 




1858 and 1862 


1866 




1865 


1868 




1868 


1872 




1872 


1879 




1879 


1882 




1882 or 1883 


1886 




1886 


1890 




1889 


1894 




1893 









The figures may also be compared graphically by the methods 
of the previous or following sections. 

The averages for the nineteen Januaries, nineteen Februaries, 
&c., in the years 1865-73, and similar averages for the years 
Measurement ^ 874-93, and the whole period are given in the 
of seasonal table and exhibited in figures 2, 3, 4. When we 
^*^°** calculated the annual averages just discussed we 
eliminated by that process the seasonal fluctuations ; by this 
new series of averages we eliminate the influences of particular 
years. If we took, for instance, all the November numbers out 
of a series of figures totally uninfluenced by the seasons, if such 
could be found, and compared these with the general average 
for all months, we should in the long run find just as many 
instances above as below this average ; but if the figures were 
influenced by the seasons, we should find a considerably greater 
number above than below, or vice versa. The greater the 
seasonal influence, the greater would be this excess or defect 
Averaging numbers in this way eliminates the non-seasonal 
causes, for by hypothesis the excesses and defects due to them 
will in the long run balance one another ; and except by 
averaging these cannot be eliminated, unless they can be actually 
calculated. The excess of the November average above the 
general average will be greater than that of October, if the 
seasonal causes exert more influence towards excess in the 
former than in the latter month, and the curve which shows 
these averages will show a resemblance to that which would 



PERlOt)lC FIGURE». 183 

be obtained if the non-seasonal causes were absent. It will 
be only a resemblance for two reasons : first, because in the 
comparatively short series of years with which we are generally 
obliged to be content, a very effective non-seasonal cause will 
leave its mark on the average, as may be seen in the table on 
p. 179; secondly, because seasonal and non-seasonal causes are 
often not independent ; a depression of trade is accentuated by 
a sharp winter ; a bad season in a year of bad trade may increase 
the want of employment greatly and suddenly, while a good 
summer in a prosperous year may reduce it almost to zero. 
In the case we are considering the interaction of causes tends 
to exaggerate the seasonal maximum and diminish the mini- 
mum ; in other cases a contrary effect might be found. 

In figures 2, 3, 4 the curve for the latter half of the year 
is prefixed to that of the calendar year, because the character 
of the yearly waves is seen most clearly from minimum to 
minimum. It may be noticed that the wave in figure 3 is 
less definite in shape and has a smaller rise and fall than that 
of the earlier period shown in figure 2 ; it would appear that 
the seasons are losing their influence. 

If there is a definite annual period, that represented by 
figure 4, it may be expected that a figure of a shape similar 
to this — 

5 l^l^HBIHI^^H 5 




will be repeated annually in figure i ; it is shown well in 1864, 
1882, and other years. In the great majority of cases the yearly 
The annual maximum is reached in December or January ; at 
wave. the end of 1858 the maximum is absent, but is 
replaced by a break in the rapidity of the fall ; at the end 
of i860 there is a rise, but the spring fall following is checked 
by the general upward trend ; similar remarks apply to all 
the great fluctuations. There is no doubt that right along the 
line we find at nearly equal intervals these pointed crests above 
the line of averages. 

The minima are not so conspicuous, for the pointed shape 
is absent, trifling causes bring them near the smoothed line, and 
they are easily masked by a general fall or are absent because 
of a general rise. In 1861, however, there is a distinct minimum 



1^4 ELEMENTS OP STATIsnCS. 

in spite of the strong upward tendency ; the minima are very 
conspicuous throughout the fluctuation of 1865-70; and from 
1859 to 1888 the minima are fairly marked, except in 1876, 
iSSo, and 1881. 

The following figures show the effect of a stationary, rising, and 
falling average annual rate on the shape of the seasonal wave :— 

a, SMSonil wave OQ ((alionaiy line of aTcragei. 



Dm. I Jul Dec 

't aaperinipOKd on ndng line of avo-ages. 



These figures are drawn by adding or subtracting tlic average 
monthly differences from the genera! average 

(yiz. '**■ ^^^ ""■ ^^- May. June. July. Aug. Sept. Oct. Nov. Dec.\ 
V +1.I +1.0 +.6 +.2 -.4 -.7 -I.I -.9 -.7 -.4 o +1.6/ 

month by mnnth to or from the positions shown on the straisjht 
lines joiiiinij the annual averages. On a rising line the spring 



PERIODIC FIGUREa l8S 

fall tends to become horizontal and the autumn rise steeper; 
on a falling line the spring fall becomes more rapid and the 
autumn rise is checked. 

If this seasonal wave, added to the slower long-period 
changes, were the complete explanation of these numbers, 
figure I (p. 179) would be entirely composed of modifications 
of figures a, b, and c. Figure a is exemplified especially 
in 1855-57, 1864-65, 1871-73; figure b in 1860-61, 1S66-67, 
1877-78, 1883-85; figure c in 1859, 1863, 1880-82, 1886-89. 

As explained above, the two sets of causes are not indepen- 
dent, and these figures are not reproduced exactly ; but the 
fflimiiuition of resemblance is sufficiently close to make the 
flnottuUDni. following method of eliminating seasonal fluctua- 
tions partially applicable. Combine the monthly excesses and 
defects just given with the original numbers, by subtracting the 
excesses and adding the defects ; this process should tend to 
produce a straight line thus : — 



— fiom liguie I. 

..coirecledliguie! 



But the result is not more than a tendency, because of the 
unusual fall in January 1883, and it is difficult to find a perfect 
example. This method is applied in figures 6, 7, and 8 in an 
attempt to disentangle the seasonal fluctuations from the effects 
of the commercial crisis of 1872, the depression of 1879, and the 
turn of the tide in 1883. In figure 6 it is seen that January 1872 
was the best month relatively, though the absolute minimum 
was not reached till June of that year ; from this it appears that 
January 1872 was the turning point of the great inflation, a date 
somewhat earlier than that generally given. The date of the 
maximum of 1879 is left unchanged by this process, and that of 
the 1889 minimum Is only shifted one month. 

We have still to discuss the criteria of the existence of a 

period. In figure i the optical evidence is sufficient to suggest 

ontttiaofaziit- the annual period, but it may be doubted whether 

■nMtfpMod. an annual fluctuation would be suggested by a 

diagram representing wheat prices. It is clear that if the 



l86 ELEMENTS OK STATISTICS. 

monthly entries of any returns whatever were averaged in 
months over any period of years, that the averages for January, 
February, &c., would not be exactly equal, even if there were 
no seasonal influence. The following diagrams show various 
averages : — 

Unemployed ironfounders 



Of these the first three may be expected to be seasonal, while 
the last, which shows the averages of the dates on which fell the 
first Sunday in 20 Januaries, 20 Februaries, &c., in a series of 
years, certainly is not. 

The following simple tests may be applied to decide this 
point. If the period is in any way connected with the seasons, 
it will correspond to some extent to the ordinary weather charts 
of temperature, &c, which have a single annual maximum and 
corresponding minimum. Phenomena affected by the weather 
may also be expected to show a single maximum, nearly coin- 
ciding with the maximum or minimum temperature ; thus the 
maximum unemployed coincides with the minimum length of 
daylight and precedes the minimum temperature. In some 



PERIODIC FIGURES. 1 87 

cases a second subsidiary maximum may be shown, since, for 
example, an excessive death rate may be due to excessive cold 
or heat ; but even in this example further analysis would pro- 
bably show that the one maximum was for the old, the other 
for the young. Wheat prices may also show two minima due 
to the harvests in the two hemispheres. The "Sunday" curve 
just given shows four maxima, and is not seasonal. More than 
one maximum is evidence against periodicity till their existence 
can be explained. 

The second test is to look at the serial diagram and notice 
how often the maximum occurs in the same month ; non-periodic 

Probability causes will hide the maximum occasionally, but in 
*®^- the long run one month will be predominant In 
figure I the maximum occurs in March and April twice each, 
in February three times, in January eleven times, and in Decem- 
ber twenty-one times. The maximum is then generally in 
midwinter. The minimum is not in this case so well defined. 

The following table shows how this analysis can be ex- 
tended : — 

Times 
out of 39. 

The percentage of December is greater than that 

of the preceding November - - - '33 
The percentage of December is greater than that 

of the following January - - - - 28 

The percentage of December is greater than that 

of the preceding July 33 

The percentage of December is greater than that 

of the following July 30 

The chances against so great a preponderance, if the seasons 
had no influence, are respectively 70,000 to i, 106 to i, 70,000 
to I, and 940 to I.* All the months may be separately tested 
in the same way. This method by no means exhausts the 
evidence, for we have only considered which of two months 
is the greater, and not how great is the excess when it exists. 
On this point the reader is referred to the paper by Professor 
Edgeworth, On Methods of Statistics^ in the Jubilee Volume 
of the Royal Statistical Society, p. 206 ; this should, however, 
be postponed till the mathematical treatment which follows in 
Part II. has been studied. 

* See Part II., Sect. I., infra. 



1 88 ELEMENTS OF STATISTICS. 



5. Logarithmic Curves. 

A serious flaw in the graphic method as used in the previous 

sections is that, when we are dealing with a series of increasing 

Need for graphio figures, . though the totals year by year may be 

representation increasing, we are compelled to represent equal 

increments on these totals by equal vertical dis- 
tances ; thus an increment of ;^20 on a total of ;^20 is repre- 
sented by the same vertical distance as an increment of ;£^20 on 
a total of ;^2,ooo. Thus in the annexed figure representing 
exports, the fall from ;^52,ooo,ooo to ;^42,ooo,ooo in 1815-16 is 
barely noticeable, though it is a fall of 20 per cent., and was 
connected with very great distress in the manufacturing dis- 
tricts ; while the fall from ;£'305,ooo,ooo in 1883 to ;^269,ooo,ooo 
in 1886 attracts attention immediately, though it is one of 
12 per cent. only. Again the increase of 34 per cent, which 
took place between 1848 and 1850 appears insignificant in com- 
parison with that of 29 per cent, from 1870 to 1872. When we 
are attacking questions of causation it very frequently happens 
that we are more concerned to know the proportionate increase 
than the actual increase. When we are considering the gradual 
growth of our foreign trade, or when we are comparing the 
growth of trade of two countries, a diagram like that annexed 
is likely to give quite a wrong impression of the struggle that 
marked the early stages. We need then a diagram not of 
quantities, but of ratios, where equal vertical distances represent 
no longer equal absolute increments, but equal proportional 
increments, that is, equal rates of increase. By the use of 
logarithms a universal scale can be constructed which serves 
this purpose. The non-mathematical student can easily accustom 
himself to the use of diagrams so constructed, by studying one 
where the actual amounts represented are entered, and noticing 
that whatever part of the scale he takes, doubling, halving, in- 
creasing by 20 per cent and so on, are always represented by the 
same vertical distances respectively. The construction of a 
Construction of diagram on this scale is as follows : — Write down 
a logarithmic the numbers in the series to be represented; 
^^^^' against them write down their logarithms ; on 
paper divided into equal squares mark at equal intervals on a 



LOGARITHMIC CURVES. 1 89 

vertical line numbers ascending in regular progression so as to 
include all the logarithms found; mark off the dates on a 
horizontal line; and on the scale thus prepared mark in 
the logarithms, instead of the original numbers. The table on 
p. 191 and the diagram facing p. 190 show the figures of imports 
and exports thus treated. On the right hand of fig. 2 the position 
of the absolute numbers is given ; on the left the correspond- 
ing logarithms. A given vertical distance, i inch, represents 
the distance .301 on the logarithmic scale; if we add this 
quantity to the logarithm of any number, we obtain the 
logarithm of twice that number for log <a: + .301 = log a 
+ log 2 = log 2a \ for instance, if we increase the height of 
the position which represents £10 by i inch, we arrive at the 
position which represents ;£"6o. Again if we now add 1.59 of an 
inch, which represents .477 on the same scale as before, that is 
log 3> to the logarithm of 2^, we obtain log 6a^ and we have — 

log da = .477 + log 2a = .477 + .301 + log a, as above 
= .778 + \oga= log 6 + log a; 

that is, we arrive at the same position on this scale whether we 
go by means of two separate ratios or by a single compounded 
ratio. Thus a diagram drawn on this principle satisfies the 
necessary conditions that equal vertical distances represent the 
same ratio in whatever part of the scale they are taken, and 
that any number of points can be entered without leading to 
inconsistencies. At the end of this section is given a table of 
the logarithms of i to 1,000, correct to the third decimal place, 
which will be found sufficient for this purpose. 

Thus on the diagram given we can see at once that imports 

were doubled in value between 18 10 and 1836, again between 

Examples of 1840 and 1853, again between 1855 and 1866, 

its use. and that their value increased 40 per cent, be- 
tween 1886 and 1899. Or we may notice that the excess of the 
value of imports over that of exports was 40 per cent, of the 
latter both in 1850 and in 1880; that the value of imports in 
1899 was thrice that of exports in i860. 

If the eye has been carefully educated to understand a 
diagram of this sort, if the fact that it is a diagram of ratios^ 
not of quantities^ is firmly impressed on the mind, then the 
diagram answers perfectly the object of the graphic method, 
that is, it gives a true instantaneous impression of a complex 
series of facts. If, on the other hand, it is found- that a true 



igo ELEMENTS OF STATISTICS. 

impression is not received, through inability to take the right 
mental position, then diagrams on the natural scale should be 
employed only, always with the recollection that they may give 
false impressions of ratio.* 

It is to be noticed that no base line should be given in 
diagrams of this class, otherwise a false impression is at once 
Velocity and obtained. Notice further that, while equal verti- 
aooeieration. cal differences represent equal ratios from any 
part of the diagram to any other, instead of equal increments as 
on the natural scale, equal degrees of slope represent equal ratios 
of increase (equal accelerations), instead of equal additions in 
equal times as on the natural scale (equal velocities). On the 
logarithmic scale a line rising with convexity to the horizontal 
shows that the ratio ,of increase is growing, as in imports from 
1 830- 1 85 3 (if the line is smoothed), while concavity, as from 1854 
to 1873, shows a slackening ; but on the natural scale the line is 
convex almost throughout the two periods, showing that the 
actual increments were increasing all the time. 

It would be useful, if space permitted, to offer several 

diagrams on both scales ; for in many series of figures the 

TT ^, i< differences exhibited by the two methods are very 

Useful appll- . ^ 

oation to Index- instructive. One case may be signalized where the 
nnmben. logarithmic scale is specially important, that is, 
when the original numbers represent ratios, not actual numbers. 
Thus in Mr Sauerbeck's well-known diagram, drawn on the 
natural scale, representing his index-numbers of prices, all the 
numbers included are percentages of their values in certain 
defined years. Suppose that 100, 80, and 60 are the index- 
numbers for three years, then on the natural scale the decre- 
ments are represented by equal distances and appear to be equal. 
The changes in the value of gold, however, are by no means 
equal in the two periods. In the first, the fall from 100 to 80 
is one of 20 per cent. ; i6s. at the second date would buy goods 
which cost £1 at the first. In the second, the fall from 80 to 60 
is one of 25 percent; 15s. at the last date would buy goods 
which cost £1 at the middle date. For the purposes of price 
index-numbers it is ratios which are important and which the 
diagram should represent 

♦ Professor Marshall suggests a simple method of correcting this false 
impression in his paper On the Graphic Method of Statistics, in the jubilee 
volume of Hat Journal of the Royal Statistical Society, p. 257 seq. 




L 



• • • •• 

. ? - • * 
• « ■ • 

-•• 



LOGARITHMIC CURVESL 





lid 




Ei- 


■11 1 '"■■ 1 1 li- 1 . 1 


Yai. 


TX 


Logiirithms. 


a: 




iBoo 1.447 


34 










i8oa 1.46a 






ifto3 I-4IS 






1804 1.431 






1805 ..447 


38 




1806 ..43. 






<8o7 1.43' 


37 




1808 I. 431 


37 




1809 ,.505 






.8,0 ,.59, 


48 




lU ;;:?s 


33 




18.3 






I8I4 1.531 


45 




18 IS 1.505 


52 




1816 1.568 


4= 




1817 I.49I 






.8,8 ..568 


46 




1819 1.491 


35 




i8m 1.505 


36 




i8ai 1.491 


37 




i8a» 1,491 


37 




.823 '-556 






1834 1.S68 






iBaj 1.643 


39 




1826 1.580 


32 




1827 '.6S3 


37 




1S28 1.653 






1839 1.643 






1830 1.663 


38 




1831 1.699 


37 




183= 1.653 


36 




1833 "-663 


40 




1834 1.690 


4= 




,83s 1.690 






1836 1.756 


S3 




1837 1.740 


42 




1838 1.78s 


5° 




1S39 1.79a 






1840 1.B25 


S' 




1841 1.799 


S= 




1843 1.806 


47 




1843 1.832 


52 




1844 1.869 


39 




\ltl ::?6l 


60 




58 




1847 1.919 


59 




184S ..949 


53 




1849 3.000 


64 




1850 3a» 






1851 2.045 






|85| 2.037 


78 






99 





■ Imports — Official values till 1853 ; real values from 1854. 

t IncludiDg re-eiporls. 

f Value of ships included from 1899, 



192 ELEMENTS OF STATISTICS. 

The logarithmic scale has special uses in the comparison of 
series of figures, and the methods discussed in the section 

oomrariMBi devoted to that subject can be readily adapted. 

tiie logaritmnio The difficulty of the choice of units in comparing 
'^*^* quantities ©f different natures disappears when we 
deal only with ratios ; we need no longer trouble about the 
method of percentages. In investigating causal relations we are 
more likely to find close connection in ratios than in quantities ; 
for if one set of phenomena is connected with another, it is 
more likely that the relation will be a proportional one {e,g.y 
that an increase of lo per cent, in some measurable charac- 
teristic of the one corresponds to an increase of 8 per cent in 
a characteristic of the other), than an absolute quantitative one 
{e.g,y that an increase of 2s. in a price, at whatever point it 
stands, corresponds to a decrease of lOO in the number of 
purchasers). Resemblance between two curves on the loga- 
rithmic scale will mean the correspondence in proportional 
change, while resemblance on the natural scale means corre- 
spondence in absolute change. 

There is less trouble in this new method in equating averages 
than before. For if the logarithms of two series are taken, it is 
quite immaterial at what height on a logarithmic scale the two 
are plotted out ; alteration of height only means multiplication 
of all the items by a constant quantity, and does not alter the 
appearance or proportion of their fluctuations. The method to 
be employed is as follows : — Draw the curves representing two 
series of figures on a logarithmic scale; then shift the lower 
curve vertically upwards to and over the other, till the closest 
possible correspondence is obtained ; draw it in in this position, 
and the two series can be accurately compared. 

The following example employs this method with a further 
development, corresponding to that of p. 177, supra^ where 
Bqnattonof fluctuations are equated. In the earlier method 
flnotiiAUoiiB. we used the average as a position from which to 
measure the various items, and adapted the scales; a similar 
niethod might again be used, but it is more convenient to keep 
to one logarithmic scale, and now we have no base line to 
consider. Calculate the fluctuations much as before, but express 
them as percentages of the adjacent maxima before taking their 
average. In the following example it is found that a fluctuation 
of 8.4 per cent, in the number employed, in those trade unions 



c 

1* 



MARRIAGE RATE AND EMPLOYMENT. 

Fig. I. Comparison in 1865-93. 
On Natural Scalb. 



100 



v^- 



E 

>^ 
o 

UJ 



o 40-. 



5 M+ 




Yaars 



-^* 



V 



-•••••*. 



... 






■.?a 



18 f 



m 

10 E 

It 



.. 5 



^ 






1870 



1880 



1883 



Logs of 

Marriage 

rate 



Fig. 2. The same ; Logarithmic Scale. 

Scale of 
Increase 



L09S. of 
per cent 
employed 
1-97 




Years 187D 



1880 



1890 



Fig. 3. Comparison in 1880- 1896. 




Y««r* 1880 1890 1896 

Same scale as fifura J. 



Marriage rate 



Employment 



T*. A.«^ J^mttrnM r^K'^ 



toCAKttHMtC CURVfc. tg^ 

whose returns are accessible,* corresponds to one of 9.7 per cent, 
on the marriage rate. To investigate a possibly closer corre- 
spondence, assume that a portion of the number employed do 
not influence the marriage rate, and find what part must be 
subtracted before this 8.4 per cent, of the total forms as much 
as 9.7 per cent, of the remainder; the average percentage of 
members of the trade unions at work in the selected period was 
95.1 ; 8.4 per cent, of this is 7.99, which forms 9.7 per cent, of 
82.4. Thus 12.7, the difference between 95.1 and 82.4, may be 
considered as not influencing the question, and subtracted 
throughout before logarithms are taken. This process would 
be replaced on the natural scale by equating the averages of 
two series, and drawing one base line so far below the other 
that average fluctuations would be represented by the same 
vertical distance for both series ; which process is exactly 
equivalent to that adopted on p. 177. Expressed algebraically, 
we are now investigating the equation — 

log (y-c)-\og ;ir =5 >&, a constant, 

where c and k are constants to be so selected as to give the 
closest fit, and jy and x are the quantities to be compared. 

In the adjacent diagrams, figure i gives the figures in the 
natural scale ; figure 2 gives them on the logarithmic scale, after 
they have been arranged so as to make average percentage 
fluctuations equal ; while in figure 3 the shorter period, 1880-96, 
is treated in a method precisely similar to that of figure 2. The 
actual numbers and logarithms are given on the next page. 

Nofe. — In making comparisons of series, whether on the logarithmic or 
natural scale, instead of taking the average of the whole period and measur- 
ing deviations therefrom, it is legitimate and often advantageous to measure 
the deviations from a curve smoothed, as on p. 151, a line of moving aver- 
ages. We are then ignoring the causes which have a gradual and permanent 
effect, and comparing the short-period fluctuations. See Mr R. Hooker's 
paper in the Statistical Journal^ September 1901. 

* The figures in columns 2 and 4 in the second table on the next page 
are taken from Mr G. H. Wood's paper on Some Statistics of Working Class 
Progress since i860, Statistical Journal^ 1900, where a valuable logarithmic 
diagram will be found, illustrating many of the points of this section. 



N 



ELEMENTS OF STATISTICS. 



Maioiaci 


Rate pb 


„„ 






Hhd.». 


•^ 




V«. 


Muinu. 


Minima. 


DiSi- 


&'. 


v™. 


UKdau. 


Al 


1869 




'59 






1868 




91-5 






1873 


17.6 


















1879 




14.4 






1879 




87.S 




laS 


[881-83 


15-5 










98. 1 








18S6 




14.1 




9 
6 












1891 


M.6 




1889-!^ 


97-9 








1893 




14.7 

















Average percentage employed, l86s-93t9S-" : 8-4 per cenL of 95.1 159.7 V" 



Y«n. 


Rue. 




'^^^' 


Lb....?. Loy 


^^. 


186S 


17.5 


1-243 


98.0 


85.3 


93' 


|g66 


17-5 


1-243 


96.9 


84.1 I 


925 


1867 


16.S 


1.217 


92.7 


S0.0 1 


903 


1S6S 


16.1 


I.207 




78.8 I 


896 


1S69 


11? 




79-9 I 


902 


i8ro 


:.2o7 


i' 


83-0 1 


919 


IS7I 


16.7 


1.223 


!5'5 ' 


932 


1872 


17.4 


I 240 


98.9 


86.2 I 


935 


■873 


17.6 


1. 245 


98-7 


86.0 1 


934 


1874 


17.0 


1.230 


98.2 


85-S ■ 


932 


Si 


16.7 


1.223 


97-5 


84.8 1 


928 


16.5 


I.SI7 


96.4 


83.7 I 


923 


;m 


15? 


1.196 


95.6 


82.9 I 


919 


IS- 2 


1.182 


93-7 


81.0 1 


90S 


1879 


14.4 


i.rsS 


87-5 


74.8 


874 


1880 


14-9 


1-173 


94-1 


81.4 1 


911 


IS8I 


15- 1 


1. 179 


96.5 


83.8 1 


923 


1881 


15s 


1. 190 


98.1 


85.4 I 


93' 


1883 


15-5 


1.190 


97.8 


85.1 I 


930 


I88i 


15-1 


1. 179 


92.6 


79-9 I 


901 


lii 


14-5 


1.161 


9(,o 


783 I 


894 


14.2 


1.152 


90.4s 


77.7 f 


890 




14.4 


1.158 


92.6 


79.9 I 


902 




14.4 


1-15S 


95-2 


82.S 


916 


1889 


iS-o 


1.176 


97-9 


85.2 


930 


.890 


,"5-5 


1.190 


97.9 


5s-^ • 


930 


z 


15-6 
IS-4 


i:;i| 


96. S 

93-7 


111 \ 


SI 


1893 


14.7 


1.167 


92-S 


79.8 I 


902 












JT 






1.196 







TABLE OF LOGARITHMS. 



I9S 



< 

N4 

o 

Q 

Q 

H 

» 



•-4 

o 

•-4 
P 

H 

CO 

iz; 

H 

O 

H 

H 
O 
M 

pe; 

O 
U 

o 
o 
o 



o 



CO 
PQ 

o 

CO 

H 

<J 
O 



• 

1 


t«j(»^qQ 00 00 00 00 00 CS 0\ Ov CK Ov Ov Ov OOOOOOOOmmmmmmmWNW 


1 


S 53 sr> S* 10^ i>99 Q\ o M w CO T^ u^\o r^oo ov o « « co ^ u^\o «n.oo ov o m n po 

OOOOOOOOOMt-MMMMMMMM«WN««NWW«N fO CO rO rO 
fOCOfOrOCOfOfOCOCOCOfOCOfOrOfOfOfOfOCOfOfOCOfOfOfOfOfOrOfOfOrOCOCO 


1 


« N rococococororo^T"*Tl-^*u^u^miriu^m u^\0 vo\ovO\OvO i>.rxi>.i>.rxi>. 


^' 


r^OO Ov O M N CO ^ lOvO t^OO ON O M N CO ^ irivD r^OO On O •-• N CO ^ u^\0 l>iQO C* 
\OvOvO l>.t>,t>,t>,t^«N.r^t^«N. «N.O0 000000000000000000 0^0\0\0\0^0\0^0\0\0\0 


• 

1 


0\ M cou^r<*00 O « ^\0 !>• Q\ M co^vDOO O M rou^t>.00 O N COU^t>.00 Q N cou^ 

vo t^ !>. !>• t^ t^oo 0000000000 ovovo\o\o\Qdo o oo *•* •-« *•* w w w g.N «i« 

CO CO CO CO CO CO cO CO CO CO CO CO CO Co CO CO Co ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 

• ••••••••••••••#••••••••••••••••4 




^ u^\0 t^OO ON O •-< N CO ^ u^vO r^OO On O M C» CO ^ iri\0 l>.00 C* O « « CO ^ I'lvO 
cocococococo^^^^^^^^^^mmu^u^u^mu^u^ir> u^vo vo \0 \0 vO vO vO 
C4C4C4MC4C4C>«C4C^C>«NC^CSic4C^C>«MC4NC4MC4C4C4MC4C4C4C4C4C4Nt< 


• 

! 


5? VP ?> O ^ ^^ 00 O c» -^vo 00 O c» Th\0 00 O Ci ^vO oo O « T^^O 00 Q N Th m «N. 
OOOmmmmi-imnNc^Ncococococo^^^ ^^ u^ ir> u^ u^ ir>\0 vo \0 vO vO 
CO CO CO CO CO CO CO CO CO CO CO cO C^ CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO 

cicJc4cl(4NNNc<clc4c{cJc4ciclc4NdclciclclcicJclcicicJcicJclcl 


6 


2^5?^ "^SJ fc»^ ON O •-• « CO ^ lOvO t^OO 0\ Q M N CO Tf irj\0 t>.00 On O « N CO 
OQOOOOOQOmmmmm»^mmi-<i-<«NNC»«WN««N CO coco CO 


1 


CO u^OO O covO 00 »-• COvO 00 O CO irjoo Q N ir> !>. O N ThvO Ov •-• covO 00 O N ^ t>. On hh 
« c» N cocococo^^^-^mmu^ mvo vo vo vo t>. r^ r^ i>. r^oo oooooo onOnOnOnOnO 

• ••••■•••#•••••••••••••••••••••••• 


d 


l>.00 C* O M N CO Th lOvO r^OO On O »^ C» CO -^ m\0 t^OO On Q m « CO ^ I'lvO «^00 Q» Q 
^OvO^O t^t>,«N.r^t>,t«^l>.«N.I>. I>.00 000000000000000000 OnO^OvOvOnOvOnOvOvOnO 


! 


!>. O ^ !>. O cOvO On N irioo •-• ^ r<* O co\0 C* N "^00 Q covO Q\'-' ^t>0 « lOt^O 
M cococO^^Th^miri u^\0 NO NO t>. r^ !>. I>.00 0000ONONONONOOOM»^MtHP| 




^ irivo l>.00 ON O »H CI CO ^ m\0 t«^00 On O »^ N co -^ mvO t^OO C* O »- f» CO ^ ^OnO 
cococococoro^^^^-^^^^^-^u^mvou^mu^mmm mv5 \o vO nO vO vO vO 


! 


^ONCOr<*iH VOONCOI>.»H iriONCOr-*!-! ^oo n vo on covO Q cot>.Q •^t".'-" ^t>.'-' ^ 

8 8 o o S 8 S ??3'S'S"o'o'^^^ S-S-o^? S^S^S'S 2 2 S S S 2 2 


d 


H4 N CO rh trjvO »>.00 Q\ O M N CO Th irivO t>.00 Ov O »^ « CO T^ mvO »>.0O On O •-• W CO 
000000000'-«»-iMMi-iMM»-.MiMf|NNNNN«NNe<»c0C0COco 


i 


vO roONirjtH i>.coONmH«vO NOO COOO ThON^ONThO ^ONThON -^00 COOO N l>. •-• vO Q 
« CO CO Th m mvO vD »>.00 OOONONOOMi-4«c««rOTh'i*^tO u^vO VO »>. »>.00 00 ON o^ Q 

ooooooooQOooooooooooooQOoooNCSoNavo^avONONavONavONONO^o^ONO^ONONONO 

MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMCi 


• 
o 


r^OO On O M « CO ^ u^vO t^QO On O m C» co ^ lOvO t^OO On Q « « CO Th irivo »>.00 On Q 
NOvOvO t>.t>.t>«t>«r<«t>.rxt>.t>. t>«C>0 000000000000000000 OnOnO^O^OnO^OnOnOnOnO 


1 


11 Tl-vO 00 O «i N COCOCOCOCOCON m Q OnCO vO ^C» OOOvO cOmOOU^N Ov\0 CO O 
CO ^ u^vO 00 ON Q »H C» CO ^ vovO »>.00 On On O m « CO -^ Th u^\0 t>. t^OO ON On Q m N 
IT) IT) IT) IT) IT) ir)vO vOvOvOvOvOvO^OvOvDvO t««t>»t>>t>>t>>t>«t>»t^t««t^t«»t<^ t>«GO 00 00 


^ 


^ u^vO l>.00 Ov O M C» CO ^ lOvO l>.00 ON O M CI CO Th trjvO t>.00 On Q « N CO ^ i£)vp 
cocococococo^^^^^^^ThTh^vOU^vou^u^tomt'JU^ t'JvO \0 vO vO vO vO ^0 


^ 


O »- t** W ONOO VO CO T^ »H ON -^vO VO -^OvoONM N N « Q0OU^»-r^C»r^'-«i£>ON 
CO Tfv5 VO t^OO Ov Ov 00M»-'MN«NC»C0P0C0C0C0C0^^'^^^^»J0tr) 


d 


w « CO ^ lOvO t^OO On O M N CO ^ vOvO t>.0O Ov Q m « co ^ trjvO l>.OQ Ov O m N CO 

MMMM»HMMi-iMMC)C4C4C4NC4C4 N « W CO CO COCO 



196 



ELEMENTS OF STATISTICS. 



u 

Q 

o 
Pi 

X 

H 

H 



H 

o 
Q 

Pi 
< 

iz; 

X 

O 

H 

H 
U 
U] 
P< 
P< 
O 

u 

o 
o 
o 



o 



i 



O^ C* O^ O^ 
c4 



t>.0(0 



f>i rx t>. t^ t^ t>. t>.oQ 00 



N rh u^vp 00 ON 



Q N 

o\o^ 



CO ^VO »N,00 



d 



d 



-^ , - 00 OOOOOOOOOO 0nO»0n0\OnO»O^ 

On On On On On On On On On On On On On On On On On On On On On O^ 

« el W C4 « C4 CI Ci N C5 N « CI N N Ci « « W N N W N N N W c4 N c4 CI CO 



O fOsO ON W u^OO w ^ ^* o ro\o ON « u^ao •-' ^ «*• O POVO on n 
OQQQm m m w N n ro^OfO«^^^^"^»o u^vo nO \S vO r> 
OnOnOnOnOnOnOnO^OnOnOnOnOnOnOnOnOnOnOnOnOnonOnOnO> 



ir>oo 



000000 cS ~ 
On On On On On On On 



f> h> K.0O 06 00 On ON ON 



ON 



I 



On On CS On 



O « CO u^nO 00 O m 



CO^VO 

N N N 



^N. On O N cOlrl^N.00 O '-' N ^u^t^OO O •-• CO 



N c< CO CO fO CO ro CO ^ ^ T^ 'i- ^ ^ ^ lo IT) u^ 
OnOnOnOnOnOnOnOnOnOnOnOnOnOnOnOnOnOnOnO^O^OnOnOnOnO^OnOnOn 



ClddMdMMdM 



«ei«««ciNwciNclci«clwwcici«<4NciciN 



6 



M ^ t^ O cOvO ON N u^OO t- ^ 

O0Oi-<MMMf4C4N COCO 

OOOOOOQOOOQOOOOOOOQOOOOOi 



»- ^ »*• O COVO On C* "^00 i-i ^ »N, O covO On N irjOO 



CO ^ ^ ^ ^ »mo w^vO vO vO »>.»>. f* t>»00 00 00 

oooooooooooooooooooooooooooooooooooooooooo 



On On ON 



i 



vD r^ On O « d CO ^\0 r>*00 On O »-• co ^ in\0 r«^00 On Q N co ^ iriv© i>.00 On O »-• co 
^Nbvp l>.»>.»>.»>.»N,»>,tN,|>. »>,00 OOOOOOOOOOOOOOOO ONOnONQnOnONOnOnOnO Q Q 
QOOOOOOOOOQOOO OOOOOOOOOOOOQOOOOO 000000000000000000000000 0000 OnOvOn 

W««W«««Clc4««NNNNNcic4cl«e<«««NNC<NNciciNci 

mrxON»-« cou^tx.ONM cou^»n,On»-« C0»O»>»0nw cO»'>t>.ON«H roiiM>»ON«H rou^r^ON 
cococo^^^^^u^»0»/^»0 u^nO vOnOvOnO »N,t>,t>,i>. tN,00 OOOOQOOO OnOnOnOnOn 

•^ vo tN,0O On M N CO ^vO t>.00 On •-• N co Tl■^0 «^00 ON »h N CO ^nO rN.00 On Q N CO ^ »n 
W N NN « COCOCOCOCOCOCOco^^^^^^^^u^^ntrju^u^u^iO mvO vO vO vQ NO 
OOOOOOOOOOOOOOOO 0000 QOOOOOOOOOOOOO 00 00 00 00 00 00 0000 0000 000000 0000 00 oo 

cicielel«e4c4«ci««el«eic4e4cic4el«c4NNc4«NN««ciciciNt«i 

t^ON« cou^r^ONM co»O^N.ON»-• cou^»>»Onm cou^^Qn*-" co»Ot>»ON»-" comt>.ONM CO 
NO vO «^ !>. »N, JN, |>.00 OOOOOOoOONONgNONgNOOOOO«-«'-''-<i-i»HC^CIMCINCOCO 
nOvOnOnOnOnOnOnOnOnOnOsOnOnOvOnOvO t>> <^ <^ t>* t^ t^ t^ >>> t>> t>> ^N> t«i« (>> t^ t^ t<^ ^^ 

On O N CO U^NO t^ ON O N CO T^vD ts, 0\ Q i-i co ^vp t«^00 O "-I N ^ "^nO 00 On O •-• CO 
t>»0iO OOOOOOOOOOOOOnOnOnOnOnOnOnOOQOOOO'-'»^»^'-"»-''-''-'»^NNN 
t>.tN.t^t>.t>.f>^t>.t>.t>.t>.t^t>.t^r^ O.00 OOOOOOGCOOOOOOOOOOOOOOOOOOOOOOOOOO 

^ic4^ic^c^c4«c4c^^ie^«Ne4piNe«^ic4c4e^c^^^c^^^^i^^^i^^wc^^^^^ 



6 



i 



6 



»-• co»ot>ON»-t cow^to^ON*-" co^o^N.ONM co*^»N,aNM co»o»>»ONi-" fO»o»>»ONM rotn 
000O0•-•»-'»^•-•»^N^^c*NC*cococococo^^^T^T^lr)lr)lr)m lOvD vp vp 
vqvovOvOvonOnOnOvOvOnONOnOnOnOnOVO\0\0\0\OvOvOvOvPvOvOnOvOnOvOnOvO 






00 O N CO irjvO 00 O *-• CO ^vO r>. On •-• N rt- iri t^cC O '-' CO ^\0 t>. On O N CO w^vO »>• 
« cocococococo^T^T^■^i•T^T^T^lommlou^ u^vo vovOvovOvonO ^^^N.^^^^^>.»N, 

NNc4c»ciNciciwNNNNNNWciNNNciriririNri«cic»clciNN 



o 



»0»>»Onm COU^»n,ONM CO»'>t>.ONM COU^»>»ON»-" COU^I^ON*-" cou^»n,On"-" coti^t>.pN 

cocofO^^^^^»oir>tr)io irjvp vOvOvono ^N.^>»^*^>. t^oo oooooooo OnOnonOnOn 



2 



QNM C0iOr>.0N0 M ^nO 00 on i-i CO irivp 00 O C* CO u^ ^N.qO O W coiOt^ONO N coirit> 
Np »> »> |> t> t>05 OOOOOOOOOOQNONONgNONOOOOOO'-'Mt-ihHMtHNWWNPI 



vO vO vp NO VO NO 

ri N ci CI ci 



vOvpvpvpvPvOvovPvOvP 

c<«Nc4c4c4wNciNc^e4c»Ncic»NciciciNcipiricic4NNc4 



CO 

Pi 
n 

D 

iz; 

O 

X 

H 

•-4 

< 



o 
Z 



tN,pNt-4 CO»O^N.ON^-^ CO»O^N.ON•-^ CO"^»>»ON»-" COti^t>.ONi-" CO"^r>iON»-4 COmr<*ONt-i CO 

vp vO t>. r>. t>. »N, tN,00 00000000ONONONONONOOOOO"-'M»-"»-"»-"Clf<N«NCOfO 

^ rt ^ ^ ^ rt ^ rjr rt ^ ^ ^ rjr ^ ^ ^ ^^n^riinxnirim^riininininxriiriininxr^u^ 



!" 

^ 



COvot>0 e< ^vOOO O M -^VP 00 O N TfvPOO 
000"-''-''-'Mi-iNNNCICICOCOCOCOCO 
vp VO vp vP vp vp vp vP vp vP vp vP vP vp vP vp vP vp 



O e< ^vpoo O N T^^poo O N T^vp i^ 

^^<^Tf^voiOvoi^ U^VP vp \P vp vp 
vp vP vp \P vp vP vP vP vP vp vp \p vP vp vp 



NNNC«C«N«NNC»NCIC«NNC«NNCINC<NNC«NC<M«C«C<«W« 



t^ONl-" COWM>»ON»-l 



»-• COlOt>.ONM COl'iro^ONM cou^ 
00000»-"i-imh^imC«CINNNCOCOCOCOCO^ 



colo^N.ON•-^ co»nts.ON»-i com 
^3!i3!^"^"^"^ tO\p vp vp 



rj- 



'«t'«*-<^t'«*-'*'<rrhTf 



^l>i On w 



trjOO O CO 1^00 O CO ir>00 O CO irjOO O N lO »>. On N TfvP On »-• CO lOOO Q « 

N N c0C0C0C0^^T^T^lOtr)»n li^vp vp vp vp vp t>. t>. t^ r>.00 000000 ONONONONC7V 

tnir)iotou^tr)ir)ir)tr>i/)tOtr)i/)tOtr)i/)tOiou^i^i/)votovoLOu^u^u^votoi^i^ 



«W««««N«CIN«««NC«««NC<NNWN«C<C««N««NNN 



I 



tO^N.Ov^-^ COim>.ONM co»n»>»ON»-i CO"^»>»Oni-i CO^^t^^ON*-" COiOtN.ONN4 cO»')r>i 
^ococoT^^^^^»n»n»r>lO tnvo vpvpvpvo ^N.^>»^^^>» t>.00 oooooooo ononOnOn 

CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO CO 



Si 



CHAPTER VIII. 
ACCURACY. 



CHAPTER VIII. 
ACCURACY, 

Introductory. 

There is not in existence a perfectly accurate measurement, 
physical or economical, just as there is no perfectly straight line 

The nature of ov perfect fluid. We can best illustrate the nature 

measurement, of economic measurements by considering that of 
physical. It is easy to weigh substances accurately to i gram : 
then by obtaining a good balance, we can, as our apparatus is 
improved, weigh accurately to a centigram, milligram, and one- 
tenth of a milligram ; but for accuracy beyond this the balance 
fails us. Similarly in measuring angles, the naked eye can 
distinguish an object which subtends one-thirtieth of a degree ; 
with a sextant a measurement can be taken correctly to fifteen 
seconds of arc ; the Greenwich astronomers can make observa- 
tions correct to one-hundredth part of a second, but we again 
come to a point beyond which precision is unattainable. 

In such cases the result is stated as correct to a milligram, 
or whatever it may be ; in the same way we speak of an esti- 
mated sum of money correct to a pound. 

A task which has considerable resemblance to some statis- 
tical estimates, is the measurement of the parallax of the sun, 

Physioaiand which determines its distance from the earth. 
statiBtioai During the eighteenth century astronomers esti- 

measnrements. jjj^^gj {^ ^g lo"^ equivalent to 96,000,000 miles. 

As methods of observation and instruments were improved, 
observers began to agree that the whole number of seconds was 
8, but gave various estimates for the first decimal figure. Since 
1865 there have been very few estimates which have not given 8 
as the nearest figure for this place (8.8''), while more recent 
observations agree in making the parallax from 8.76" to 8.78". 
We may, therefore, consider that the distance is now accurately 
known to within i in 4O0. Notice in this connection, first, that 
the earlier observations have been subject to corrections ; 



200 ELEMENTS OF STATISTICS. 

secondly, that better agreement has been attained as time has 
gone on ; thirdly, that neither absolute agreement nor ab- 
solute accuracy have yet been obtained. So it is with statistical 
measurements ; we might instance the gradual settlement of the 
curve representing expectation of life, the measurement of the 
fall in prices, and the development of wage statistics. 

Again in physical measurements, though we can sometimes 
reach a very high degree of accuracy, as, for instance, in the 
DegreegofpM- weight of a cubic foot of water which could doubt- 
libie aoonraoy. i^gs be known correctly to one part in a million, in 
other cases we are glad if we .can measure to one part in ten, as, 
for instance, in the distance of the nearest fixed star from us, 
which is, roughly, from 34 to 37 billion miles. So in statistics 
it is something if we know that the total capital of the United 
Kingdom was between 2^ and 10 thousand million pounds in 
1885, or if we know that the average weekly wage of working- 
men in full work was from 21s. to 27s. in 1886. The weak point 
in such statements is that often when we have made an estimate, 
which we know to be inexact, we are not able to give any esti- 
mate of the limits of the error. We are not so definite as TAe 
Modem Traveller who 

** . . . knew the weather to a T, 
The longitude to a degree, 
The latitude exactly." 

We are not able to say "our estimate is 24s. sd.j we are not 
certain to id., but it is not possible that we are as much as 
IS. wrong " ; whereas in physical measurements we can often give 
the result correct to the smallest graduation of the instrument 
employed. 

On the other hand, though we cannot obtain exactness, we 

can in many cases estimate to that degree of accuracy which is 

The accuracy required for practical purpose. In common use 

generauy needed. Qj^jy ^ certain Conventional accuracy is needed. 

Thus, to take some miscellaneous instances, the area of an estate 
is given in acres, roods, and poles, but not correct to square 
yards ; the market prices of shares do not change less than 
t\ ; we keep the day, not the hour, of our birth ; railway 
time-tables do not show seconds ; ocean steamers are timed to 
start at certain hours, not minutes; height is measured correct 
to one-tenth of an inch ; a hundred yards race is timed to one- 



ACCURACY. 20I 

tenth of a second. Similarly in statistical estimates, we seldom 
need that our results shall be accurate within one per thousand, 
or even i per cent. One per thousand of the working week is 
only three minutes ; i per cent, of the week's wage is only 3d. 
We do not care to know the population of London within 100, 
the expenditure of the Exchequer within ;£"i,ooo, or the expecta- 
tion of life within a day. It is often possible to attain practical 
accuracy within such limits. 

Definition of Error. — For purposes of measurement we 
may take the following definition: — The error in an estimate 
is the ratio of the difference between the estimate and the true 
value, to the estimate ; the error is to be reckoned positive when 
the true value exceeds the estimate. 

Thus if the average weekly wage of agricultural labourers 
was in reality 14s., and we estimated it as 13s., our error would 

be '^"'^ ss-^, or y.j per cent; if we had estimated it as iss., the 
error would be '^"^^ = ""^» ^^ ^6.6 per cent 

In algebraic notation, if u be the measurement of a quantity whose 

true value is «*, then is the error in the estimate, which we shall 

u 

call e\ so that e = — — , and u^ = u (i+e\* - is an appropriate measure 

u ^ ' e 

of the accuracy or precision of an estimate, becoming infinite when the 

error is zero. 

In the nature of things, when we are dealing with errors, 
we do not know their magnitude ; the most we can know 
statement of IS their probable and possible extent We 
erron. might estimate, for instance, the percentage of 
unemployed in a certain year as 4.5, and add, from informa- 
tion in our possession (coming from a study of wage -bills 
or the reports of relief agencies), that we considered this to 
be within .5 of the fact ; we should then write the number 
4.5 +.5, meaning that the error in the estimate as defined above 

was unlikely to be more than -^ = ^» or n per cent, and the 

* This and most of the following algebraic paragraphs are from a paper 
on the Relations between the Accuracy of an Average and that of its 
Constituent Parts, by the present author, in the Statistical fournal of 
December 1897. 



202 ELEMENTS OF STATISTICS. 

precision was 9. In such a case we can also give definite limits. 
The percentage unemployed must lie between o and 100 ; and 
if we could actually enumerate i per cent, of the working-class 
as out of work, and also 92 per cent, as in work, we should 
know that the number required was between 1.0 and 8.0 per 

cent, and the maximum error in our estimate, 4.5, was ^ = L or 

78 per cent. Even this is more precise than the original state- 
ment, " the percentage is 4.5, error unknown." By further investi- 
gation we might perhaps bring the limits of error nearer to 
each other, and decide that it was practically certain that the 
percentage required was between 4 and 4.5 ; then we ought to 
say " the number unemployed is .04 . . . of the working-class, the 
estimate being correct to the last figure given." This statement 
is of the same nature as, " The body weighs 1 5 lbs. 3 oz., correct 
to an ounce." 

While, on the one hand, it is clear that we cannot often 
obtain close definite limits to our errors, on the other we can 
very often see that some of the digits in a total are almost 
certainly right and others almost certainly wrong. Thus when 
we see in the Registrar-Generars Report that the population of 
the United Kingdom in 1895 was 39,124,496, the estimate being 
made from the census of 1891, and the increase calculated on 
the basis of the increase since 1881, we may be certain that 
the last two, or the last three, digits are no better than guess- 
work ; while the first two, or the first three, are correct Thus 
the statement should read : Population was 39.1 millions, or 
39,124,000 + 5,000, or whatever figures our examination of the 
varying rate of progress of the population led us to adopt, and 
this statement is actually more correct than the previous one. 

It is the custom in many classes of estimates to give the 
figures to the uttermost farthing. This is possibly right in 

Neglect official publications ; for the business of the office 
ofminutiaB. jg ^q receive and tabulate returns, stating how 
and whence they came, and leaving to the economist or the 
statistician the task of deciding the degree of accuracy per- 
taining to them. But in summary descriptions and accounts, 
and in scientific estimates, it is not merely unnecessary to give 
these last figures (both because they are not accurately known, 
and because they generally have no importance to the argument 
or significance to the reader), but it is positively inaccurate. 



ACCURACY. 203 

The easiest way to avoid the inaccuracy is simply to state totals 
in so many thousands {e.g.^ the earth is 8,000 miles in diameter), 
or if for any reason more exact measure be required (as when 
we are comparing the equatorial diameter with the smaller one 
through the poles), the scientific way is to give the number as 
far as it has been fairly calculated, and to indicate its precision. 



Rules for Computing the Effect of Errors. 

We may now give some rules connecting the errors of a 
complex estimate with those of the elements which form it. 

I. The error in an estimated sum is equal to the sum of the 
errors in the parts when each is multiplied by the ratio of the 
corresponding part to the sum. 

For if we estimate ;; quantities as t^^, t^j • * - ^n. ^nd their sum 

-•, as Uy so that u^u.-\-u^-^ . . . «« and the errors of the 

quantities are ^^ e^ . , , e^ and that of the sum is e) 
then the true value of the sum is u (1 +tf), and the true values of the 
parts are u^ (i +^1), «2 (' +^2) • • • > so that — 

«(l+tf) = l^l(l+tfl)+«2(l+^2) + +. 

but «=«! +«2 + + ; 

hence, by subtraction, ue = u^e^ + «2 ^2 + + > 

and e^e.x^ +^oX?^ + +. 

^ u ^ u 

The formula is easily adapted to the case where some of the 
parts are subtractive. 

To take an arithmetical example, if two trade unions return 
respectively 555 and 45 members as out of work, while the true 

numbers are 565 and jo, so that the errors are — and -, then 

the error in the sum is by the above rule — 

The greater error in the returns of the smaller union has little 
effect on the total. 

We can apply the rule to the important case where we 
can estimate a great part of a required total with considerable 



204 ELEMENTS OF STATISTICS. 

accuracy, while we are ignorant of a smaller part Thus we 
may receive returns from several unions that 33,650 are out 
of work, and have reason to know that the error is not more 
than I per cent, while some smaller unions do not send any 
returns ; we make an estimate for the smaller unions, say that 
1,000 of their members are unemployed, and suppose a very 
large error, say § or 6y per cent Then the error in the total is 
less than — 

-L of 53650 .2 J. iooo ^ 

100 °' 34650^3 346so-^'9P^^^^"^-» 

an error very much nearer that of the larger returns than that 
of the smaller. In the preceding sentence we say "less than," 
because we assume that we have taken an outside limit for the 
smaller error. 

II. The error in the arithmetic average of several estimates is 
the sum of the errors of these estimates^ when each is multiplied by 
the ratio of the corresponding estimate to that of the sum of the 
estimates. 

For if Wj, ^ly . . . Wn are n estimates of quantities whose true 

Brrorin values are m^ (i +^i)> ^2 (' +^2)* • • • > ^^^ estimated and 
average, true averages are respectively — 

n n 

and the error in the average is — 

^1 (^ +<^i) + ^2(^+^2)+ + _ m^-\-m^+ + 

n n ^i^t "^^2^2"^ "^ 

^1 + ^2 "*" "*■ ^1 "^ ^2 "^ "*" 

= <?1>< 5-^ +^2X c-^ ^ 

where S denotes the sum of all the rn^s. 

It is easily seen that no individual error can have much 
influence on the result, that the error in the average would be 
nearly of the same magnitude as one of the individual errors, if 
these were not very unequal, and all positive or all negative, and 
that if, as is generally the case, some are positive and some 
negative (a point we shall consider presently), the error would be 
considerably lessened. 



ACCURACY. 205 

III. The error in a weighted average is the sum of{i)an error 
due to errors in the quantities, similar to the error of an unweighted 
average, and (2) an error due to errors in the weigttts, which be- 
comes very small when the original quantities are nearly equal 

For if o/j, a/g • • • ^^ estimated weights applied to estimated 
Error in quantities m-^, m^ . . , , and if the true values of the 
weighted weights are w^ (i + ^iX ^2 (^+^2) • • • > ^^^ o^ ^^^ 
average. quantities »»i (i +^1), Wg (i +^3) • • • > ^^^^ ^^ ^"^r is — 

[S{w (i +^). w (i +€)} %mw~\ ^ ^mw 
S{ze/(i+€)} ^y^w 

If we simplify this expression and neglect the products of two of the 
errors e and € (for if ^ and « are each .1, their product is only .01), we 
obtain — 



Error in weighted average is — 



siinilar~| 

are I 

quantitiesj 



L 2^/a/ ^ ^mW J L Sa/.2/«a/ pairs of quanUt 

If m^ — m^ is small, or if any other pair of the original quanti- 
ties are nearly equal, the first term in the second bracket becomes 
very small. Very great errors are required in the weights to make 
any appreciable error in the average. In fact, the errors in the 
quantities have so much more influence than those in the weights on 
the weighted average of not very unequal quantities, that errors in 
the weights can generally be neglected. Many numerical examples 
of this principle were given in the section on weighted averages. 

IV. The error in a product is approximately the sum of the 
errors in its factors, due regard being paid to sign. 

For if ^, ^, . . .X ^''c the estimated factors, whose true values 
Byror in are^ (i +^1X^2 (^ +^2)> • • • > ^^"^ ^^^ ^^^^^ ^^ ^^ product 
P'o^^«^- ^i(i+^i)./2(i+^2)> : > >-/r/2 

fvf'i* • • • 

===(1+^1). (1+^2) •• • -1=^1 + ^2+ +^n> if we neglect products of 
two or more e's. 

The e's are equally likely, d priori, to be positive or negative. 
If two ^s are of different signs, they tend to neutralise one 
another. The error in a product may be great if all the errors 
of the factors are of the same sign, even if they are small 
individually. 

For example, if we estimate that 100 men are earning oil the 



206 ELEMENTS OF STATISTICS. 

average 255. each, while in reality there are 105 men earning 
26s., the error in the estimated total sum earned is, by formula, 

If, with the same estimates, the real quantities had been 105 

and 24s., the error in the product would have been -^ =.oi. 

^ 100 25 

V. TAe error in a ratio is approximately the difference between 
the errors in its two terms y due regard being had to sign. 

For if «i, «2 he the estimated terms, whose true values are «i ( i + ^1) 
Error to ratio. 3,nd u^ (i +^2)* ^^^^ ^^® ^^^^^ ^" ^^ ^^^^^ ^^ — 

1^2(1+^2) «2 ^ l+_li « , » fi. :J2 



2 *^''2 



«2 



=tfi-tf2> i^ w® neglect terms* of the 
second order in the ^s. 

If the errors in the terms are both positive or both negative^ 
they tend to neutralise one another ; if they are also nearly equal, 
the error in the ratio becomes very small. 

We can apply Rule V. to the error in comparison of two 
averages of similar quantities estimated at different dates. 

With the same notation as under Rules II. and III., using w, e^ c, 
for the quantities at one date, and w^, ^^, ^y for similar quantities 
at another date, then the error in the ratio of the simple average 
of m-^y m^ . . . to the simple average of m^^ m^, , , is — 






Now if the quantities have not changed much during the period 

between two observations, the fraction - — , will differ little from -^— , 

and so on. 

Neglecting these differences in comparison with the quantities them- 
selves, a legitimate process when we are estimating the approximate 
influence of errors, we have — 

Error in the ratio of the simple averages - S < ?\{'^i^ " ^1) f 

If the two estimates have been made under nearly similar circum- 



ACCURACY. 207 

Stances, leading to similar chances of errors, e^ and e^ are likely to be 
not only of the same sign, but nearly equal. 

Write ^1, ^2 • • • ^or (e^ - ^j), {e^ - ^2) • • • > ^"^ w® hxvt, — 

Error = S.i^i. (^1) k where the ^*s may be small. 

The corresponding analysis for the error in the ratio of two 
weighted averages is too complicated to be given here ; * but 
using the principle that errors in weight are less important than 
errors in quantity, which applies with slight modifications, we 
may use the formula just given for the first approximation to 
the error in the ratio of two weighted averages. This formula 
may be put in words : — 

VI. The error in the ratio of two averages of similar series 
of quantities^ estimated at different datesy is approximately equal 
to the sum of the differences between the errors in the corre-- 
spending terms of the two series ^ each multiplied by the ratio of 
the latter of these corresponding terms to the sum of all the terms 
at the latter date. 

This rule is so important that it will be worth while to 

Error in illustrate it by an example, in which a further 

ooupftrisoxi of , , 

aTorages. quantity will be introduced. . 

If in each of two years we are able to estimate, as in our example 

under Rule I., one part of a total more accurately than another part, we 

can use the following formulae : — 

First Year. Second Year. 

Estimated numbers or weights - w \ error c ; w^ \ eiTor c^ 

Estimated average income, or 

quantity - - - - m^\ error e^ \ m^ \ error e^ 
Estimated number, less accurately 

known . - - - rze/; error in f,p; r%i; error in f^,p^ 

Estimated income - - - m^ \ error e^ \ m^ ; error e^ 
e^ and e-^ are, by hypothesis, less than e^ and e^. 

Error in average for first year — 
w (i+€) ,m^ (i+gi) + f {i-\-p\w (i+c) .fflg (i +^2) wm^ + rwm^ 
«/ (i +6) + f (i+p) «/ (i+c) "" w^rrw 

w + rw 

= ^, i + tfo + P • — ^ *- 

* Wj + rm^ m^ + rm^ ^ i + r w^ + rm^ 

if we neglect products of e and p. 

* It will be found in the article cited on p. 201 above ; a further approxi- 
mation also for the error in the ratio of simple averages is there given. 



208 ELEMENTS OF STATISTICS. 

• 

Here the errors, ^2 ^^^ Pi connected with the less accurately known 
part, are each multiplied by r, the ratio of the weight of that part to the 
weight of the better known part ; while e^ the remaining error, is by 
hypothesis small. 

If for simplicity of argument we assume that the ratio of the unknown 
part to the whole (but not the error in estimating it) has remained 
unchanged, and also that the ratio of the estimated average incomes of 
the two parts has not altered, we have for the error in comparison — 

^ my + rm^ m^-{- rm^ i+r m^ + rm^ 

Thus in estimating the change in average wages of Scotch 
agricultural labourers, we have figures similar in character to the 
following : — 



1867. Married Ploughmen. 


1892. Corresponding 
Numbers. 


£sitmaied number - 1,000 
Supposed true number 1,010 


Average income, ;f 36 

n M 35 


1,200 
1,220 


£49 
48 


Farm-Servants. 


• 


■ 


Estimated number - 200 


Average income — 

Money - ;^2i 
Estimated value 
of board - 13 

Total - £zi^ 


240 


;^27 5 
14 




^41 5 


Supposed true number 220 


Total income - £yi 


240 


£47 


Here 7£/= 1,000, ;//j = 
,a — 2 ^ 1 _ 2 -J 9 


36, > = ^, ;w2 = 34> «'^ = i 


,200, vt^ 


^ 1 1 



Here it is supposed that we have overvalued the income of 
the married ploughmen, and undervalued that of the farm- 
servants in both cases. We suppose, as is the fact, that the 
value of the board and other perquisites of the farm-servants 
cannot be estimated with precision, and that the proportionate 
numbers in the two classes are not accurately known. 

Substituting in the above formula we find that the error in 
the estimated ratio of the average incomes of the two classes 
together in the two years is — 

— .006, due to errors in estimates of income of ploughmen. 
+ .008, j, „ „ servants. 

- .001, „ „ ratios of the numbers in the two 

classes. 



ACCURACY. 209 

Thus the last error, due to weights, is very small, and the 
second error, due to ignorance of the value of board, is reduced 
by the smallness of the number employed to a magnitude -com- 
parable with the first. 

The whole error is, therefore, by formula +.001. Going to 
the actual figures, we find the estimated ratio of the second to 
the first to be 1.338 to i, and the supposed true ratio to be 

I-33S to I ; that is, the error is -:22| = .002. ... 
^*^-^ ' 1.338 

The difTerence between the two methods of calculation is 
then I in the third decimal place, which is accounted for by 
the neglect of the less important terms. 

It is to be noticed that the error in the ratio of two quantities 
is not the same as the error which we might be inclined to 
estimate, the error in the percentage increase. Thus in the case 
just taken, the estimated and true percentage increases are 33.8 
and 33.5, and the error in the percentage increase is .01. For 
accuracy in such calculations, then, we require the error found 
by formula, according to Rule VI., to be very small. 



Biassed and Unbiassed Errors. 

In the consideration of all errors in averaging or comparing, 
it is important to distinguish two classes of errors, those which 
Errors are ^^^ biassed and those which are unbiassed. The 
Massed or un- difference can be made clear by illustrations. If a 
number of men are sent to investigate the condi- 
tion of an industry in different places, with a view of proving 
that wages are high, conditions of work healthy, and so on, they 
would probably, by examining only the best conducted works, 
and taking the wages only of the more skilled and regular work- 
men, produce an average for each town which would be too high. 
On the other hand, if there was no brief to be held, but the 
investigation was impartial, the commissioners would in some 
towns take too high an average, in others too low, according to 
their idiosyncrasies and to circumstances. In the first case, the 
errors would be biassed, all in the same direction, all tending to 
increase the average, whose errors would be equal to the average 
error in the different towns. In the second case, the errors 
would be unbiassed, just as likely to be in excess or defect, and 

O 



2IO 



ELEMENTS OF STATISTICS. 



the more estimates made, the smaller would the resulting error 
be. The following figures would illustrate this : — 





Fact. 


Biassed 
Estimate. 


Unbiassed 
Estimate. 


Average Wages in District — a 

»> >» ^ 


24 

23 
26 

27 

28 


s, 
25 

25 
27 

28 
30 


s, 

24 

25 

25 
28 

27 


Averages - . - . 
Errors 


25.6 

• • • 


27 

5-5% 


25.8 

1% 



In measuring the distance of a bicycle ride on a mile-stoned 
road, it is found that the distances between successive milestones 
are not exact, but perhaps 100 to 200 yards out ; but it is nearly 
as likely that the errors will be in excess or defect, and the greater 
the distance gone the smaller will be the error, as defined. The 
errors are unbiassed. If, on the other hand, the bicyclist trusts 
to his cyclometer, he will have to deal with a biassed error, for the 
instrument will not fit the wheel exactly, but will always register 
say 1,800 yards when the machine has gone a mile. This is a 
case where the bias can be measured and allowed for, whereas 
the unbiassed errors must be left to eliminate themselves. It is 
frequently the case that biassed errors are due to a wrongly gradu- 
ated instrument ; unbiassed to separate faulty measurements. 

In the census returns, the fact that many women return 
themselves as younger than their birth certificate states, causes 
a biassed error in the average age of the population ; the fact 
that people frequently return their ages at the nearest round 
number causes unbiassed error, and on the whole affects the 
average little. It is not improbable that in the Wage Census of 
1886-89, there was a general tendency to obtain returns from the 
more liberally and better conducted establishments ; this causes 
a biassed error in the average obtained. With these illustrations 

we can pass on to another principle of great im- 

^SJ*If bSJsS portance. Unbiassed errors are of little import- 

and unbiassed ^^^^ compared with biassed errors in a simple 

orrors 

estimate ; but biassed errors diminish when the 
ratio of two similar estimates is taken. 



ACCURACY. 211 

For in an average of several quantities, which have biassed errors 
(^i> ^2 • • •) ^^^ unbiassed errors {e^, e^ . . .), it is easy to see from 

Rule II. that the resulting error may be written S [e-^, ?t-^ ) +8(9?. ^r-^ ). 

\ S.m/ \ SmJ 

In the first term, the errors being unbiassed, many of them are 
positive, many of them negative, and they tend to neutralise one 
another; in fact, if E is typical of the errors tf^, ^g • • • > ^^^^ ^ ^^^t 

E* 
approximation to the error arising from them in the average is ~p* 

Thus in the average of one hundred measurements, whose indi- 
vidual unbiassed errors are about — , the resulting error is 

— -^ /s/Too^ • There is no counterbalancing tendency, on 

the other hand, in the biassed errors; if each estimate was lo per 

cent, in excess, then the average is also lo per cent, in excess. 

Great effect of When aiming at accuracy our principle always is 

biassed errors, ^q ^g^j^g ca,re of the pounds, and let the pence take 

care of themselves ; and it is quite futile to diminish the un- 
biassed errors, that is to increase the precision of our mea- 
surements, while a large biassed error runs through them all. 
If we do not know of the existence of biassed errors, which 
in reality pervade our estimates, there is no remedy ; if we 
do know of them, we are likely to obtain more accuracy by 
thd most erroneous corrections for them than by neglecting 
them ; for when we make unbiassed corrections for our biassed 
errors, we reduce them to unbiassed errors, and then the more 
terms we include in our average the smaller is our resulting error. 
If, for instance, we find that the average weekly wage of agri- 
cultural labourers throughout the country is 13s., and by con- 
sidering the circumstances of the thousand returns which we may 
suppose led to this average we have reason to suppose that an 
error of is. would be typical of the unbiassed errors in them, 

then an error of .— ^ ,f that is only -d., may be expected to 

result in the average. We have here a totally illusive accuracy ; 



* See article cited p. 201, supra, and Part II., Sect. V., in/ra, 
t More correctly the error in the average is as likely as not to be as great 
as this, and very unlikely to be much greater. 



212 ELEMENTS OF STATISTICS. 

the part of the labourer's income which we have not included, 
payments at haytime and harvest, facilities for piece-work, cheap 
rent for cottage and land and smaller perquisites, is not capable 
of exact calculation. If we omit all these entirely we shall leave 
an error in our average of 2s. or so; but we make individual 
estimates of these additions, in all the thousand cases, though 
each estimate may be 2s. wrong, if there is no bias, the resulting 

2S. , A 

error on the average may be expected to be , , that is only ^. : 

our whole error is now not far from id., instead of 2s. In 
estimating the accuracy of published averages, these principles 
should be always borne in mind, and the possibility of biassed 
errors always considered. 

When we are dealing with the errors of a ratio the case is 
quite different The error of a ratio is approximately equal to 
Aocuraoyof the difference between the errors in its terms; if 
oompamons. ^^ ^i ^nd e, e^ are the biassed and unbiassed errors 
in the terms, then by Rules I. and V. (^^ — '7)+(^— ^) is the 
error in the ratio. Now the unbiassed error (^— ^) is likely to 
be of nearly the same magnitude as either e or ^;* if, as in the 

2 

above example, e and ^ are unlikely to be much greater than r, 

(e^^e) would be unlikely to be much greater than 2 , But 

(rf- — 7y), the result of the biassed errors, will, if the bias in both 
terms of the ratio was in the same sense (positive in both, or 
negative in both), be less than the original errors. If we have 
made the estimates of both terms on precisely similar methods, 
if we have asked the same questions of the same classes of 
persons, included and omitted the same details on both occasions, 
we shall have made the same errors of bias in both estimates. 
To return to our previous illustration, if we have made the 
glaring mistake of omitting everything except average weekly 
wages in the income of an agricultural labourer on both occa- 
sions, the only resulting error in the ratio will be that due to 
the change in these extra payments, which in short periods is 
likely to be small. Or, if we had taken summer wages as the 



• If E is the probable error in e or ^, then E . ^2 is the probable error in 
their difference. See p. 305, infra. 



ACCURACY. 213 

average for the year in both cases, the error in the ratio will 
depend only on the change in the relation of summer wages to 
that average. Hence the error in the ratio of two estimates 
at different dates of a slowly changing quantity is, if the 
estimates are made on similar methods, often much smaller 
than the error in either estimate singly ; for the unbiassed error 
is little greater, and the more important biassed error is much 
diminished. We need not now know of the existence of the 
biassed errors; they will disappear of themselves. If we are 
aware that there are biassed errors, and have any means of 
making fairly good estimates of them, it will be worth doing ; 
but we shall make a great mistake if we correct the bias in 
one year and leave it uncorrected in another. For purposes of 
comparison it is very seldom of much use and often of great 
disutility to make the later estimate more accurate than the 
Needfonmifor- ^^rf^^^- The error resulting from unbiassed errors 
mityinstruoture can indeed be diminished a little,* but the error 

of serial rstsnis. i,. r m\ • j. ^ t_* j 

resulting from the more important biassed errors 
will only be increased. All Government officials and others who 
compile annual returns are in a dilemma : to make their annual 
statements accurate in themselves, they should always be strain- 
ing after improvements, they should always be watching for 
changes in the quantities measured and adapting their methods 
and tabulations to these changes ; but to make their annual 
returns comparable with each other, they should be absolutely 
conservative, and cling to any mistakes they or their pre- 
decessors have made in the past with all the strength red tape 
can give them, being careful, however, not to add to the mistakes 
or make new omissions. The dilemma can in some cases be 
avoided ; for when an improved method is introduced, the 
tabulation can sometimes be given for a few years both on 
the old and on the new plans ; then when the difference 
introduced by the change is known, the earlier figures can be 
brought to the greater precision of the later. Thus the Board 
of Trade has recently included in the tabulation of exports 
ships which, leaving our shores with merchandise, are them- 



* For if E and E^ be typical of the unbiassed errors at the two dates, 
then VEi*+E* is typical of the error in the ratio, which diminishes with 
either E or Ei. See p. 305, infra. 



214 



ELEMENTS OF STATISTICS. 



selves sold to a foreign owner ; and we have the following 
tabulation :* — 





1899. 


1898. 


Exports of Home Products 
(exclusive of ships sold to 
foreigners) 

Re-exports of Home and 
Colonial Merchandise 

Total - 
Value of New Ships exported 

New total 


^^255.465,000 
65,020,000 


;^233,359»ooo 
60,655,000 


;^32o,485,ooo 
9,195,000 


;^2 94,0 1 4,000 
Not stated. 


;^329,68o,ooo 



Results. 



Ignorance of slight alterations in the collection and tabulation 
of material has been the cause of many statistical mistakes. 

To sum up the chief results of this chapter : there are two 
processes which tend to accuracy — averagings which diminishes 

unbiassed errors ; and comparison^ which diminishes 
biassed error. The errors in weights are seldom 
so important as the other errors which are present in estimates. 
Errors in a result cannot, of course, be calculated, but can be 
expressed in terms of errors in the items, from which it comes ; 
we cannot attain certainty, but we can indicate processes which 
diminish errors, and with the help of mathematics measure the 
extent of diminution. Initial errors are diminished most, when 
we calculate the ratios of weighted averages of similar and 
similarly estimated quantities. Index-numbers, which we dis- 
cuss in the next chapter, are examples of this class. 

The accuracy resulting from the process of sampling requires 
more mathematical treatment, and is dealt with in Part II., 
Section V. 



* Quoted from the Economist^ 17th February 1900, in ihe Statistical 
/ournal of March 1900. 



CHAPTER IX. 
INDEX-NUMBERS. 



CHAPTER IX. 
INDEX-NUMBERS. 

The discussion of index -numbers supplies so good an illustra- 
tion of the principles laid down in the last chapter, and index- 
numbers are so important in themselves, that, though it is our 
intention to avoid special questions, it will be worth while to 
devote a short chapter to them. 

Index-numbers are used to measure the change in some 
quantity which we cannot observe directly, which we know to 
Punotion of have a definite influence on many other quantities 
index-munbera. ^hJch we can SO observe, tending to increase all, 
or diminish all, while this influence is concealed by the action 
of many causes affecting the separate quantities in various ways. 
Thus, to take three of the quantities to which index-numbers 
are applied, the change in the relation of the precious metals 
to the work to be done by them affects prices of all com- 
modities, but very many other causes are at work affecting the 
prices of separate groups of commodities ; there are general 
causes tending to raise the wage of a week's work of average 
skill, but this general increase is concealed by numberless minor 
causes affecting different grades of labour in different degrees ; 
the change in the consumption of goods by the working or other 
classes is a sufficiently definite quantity, but it can only be 
measured indirectly by observing the varying changes in the 
consumption of individual articles. 

The use of index-numbers is not, however, confined to these 
instances, but is nearly co-extensive with the field of statistics ; 
for we have limited the term statistics to the measurement of 
complex groups and their changes ; the object of statistics is to 
measure the action of the general laws which govern a hetero- 
geneous group, and the changes produced by general forces can 
be measured, as a rule, only by their effect in individual cases ; 
thus the method of index-numbers is at once applicable to the 



2l8 ELEMENTS OF STATISTICS. 

disentanglement of that which is common to the whole group 
from those variations which arc special to individual items. 

The general method of forming an index-number, e,g,^ of the 
fall of prices, is as follows : — We select commodities, whose prices 

Metbod of ^6 c^" estimate accurately, and tabulate their prices 

fonnation. for a series of years. Choosing the prices in one 
year or the average for a sequence of years as a base, we express 
each series of prices as percentages, year by year, of their height 
in the chosen base-year, or their average in the chosen period. 
Then to find the index-number for any year in particular we 
take the average of the percentages in that year. 

The problem, of which index-numbers should give the 

numerical solution, may be compared to that presented to 

Aitronomioai astronomers who estimate the motions of the sun 

"^o«y- by observing those of the stars. As the sun and 
earth move towards some distant point, say in the constellation 
Hercules, the stars have an apparent motion, due to the unper- 
ceived motion of the observer ; those in the region of space 
towards which he is travelling appear to be spreading out, as 
the distances separating them gradually subtend wider angles, 
while those in the region from which he is moving appear to 
close together, and those in directions perpendicular to the line 
of movement appear to move backward. Meanwhile all these 
stars have their proper motions, as rapid as that of the sun, 
but in as many different directions as there are stars. On 
the whole there is a trend in the directions determined by the 
sun's motion, but in individual cases this trend is entirely lost. 
So when a change in the currency has a general influence on 
prices, this influence is concealed by the movements due to 
causes affecting only some of the commodities. In both cases 
it is possible to find the general trend, if sufficient accurate 
observations are available. In both cases the problem is com- 
plicated by the possibility of links connecting the movements of 
groups of the stars or of the prices. 

It has sometimes been supposed that we can estimate the 
effects of general causes directly ; that we can, for instance, obtain 
mdez-niimbera ^^ objective measurement of the change in the pur- 
and samifles. chasing power of gold, by evaluating it at two dates 
in terms of all commodities purchased, weighted by the amount 
spent on each ; but it is better to neglect this method at once 
both as impracticable and as not answering the purpose of index- 



INDEX-NUMBERS. 219 

numbers, for the effects of minor causes affecting separate com- 
modities would not then be necessarily separated from the main 
cause. 

Suppose that the changes in a group of quantities are deter- 
mined by one general force which acts on all in the same sense, 
that is, tends to increase all or decrease all, and by several other 
forces each of which acts on one or more of the quantities, and 
some of which tend to increase, others to decrease the quantities 
they affect ; then of the special forces, some will tend to increase, 
others to diminish the average, while the general force will 
have a cumulative effect entirely towards increasing, or entirely 
towards diminishing it. If the separate effects of the special 
forces are small compared with their number, they will tend to 
neutralise one another in their influence on the average ; and 
the change in the average will show the influence of the general 
cause only.* In the language of the last chapter, the special 
forces produce unbiassed changes, which are negligible in their 
effect on an average, in comparison with the biassed changes 
produced by the general force. To obtain this elimination it is 
necessary to take random samples, so that the laws of probability 
may have free action ; and the two questions to be discussed are 
the choice of samples, and the choice of weights to be applied to 
them. 

As we have already seen, the effect produced by varying the 
system of weights applied to so few as 30 or 40 numbers is 
unimportanoe very slight, and the error resulting from errors in 
of weights. weighting is in many cases much smaller than 
the error resulting from faulty measurements of the quantities 
weighted. We shall presently show f that the precision of an 
average increases with the number of like quantities averaged. 
From these principles it is clear that it is more important to 
increase the number of our samples than to attempt accurate 
calculations of the proper weights to give them. 

The choice of samples is in practice very much limited, for 
in calculations extending over long periods we are dependent on 
the accidental preservation of records ; and when we have taken 



* This abbreviated statement should be criticised in the light of Part II., 
Sect v., infra. See also Report of Committee on Variations in the Mone- 
tary Standard^ British Association, 1888-90. 

t See p. 305, infra. 



220 ELEMENTS OF STATISTICS. 

into our reckoning all the measurements which can be accurately 
made, the number of samples barely comes up to the minimum 
necessary for the normal action of the laws of probability. 

There are many index-numbers of wholesale prices extant, 
some of which we may pass in review. The Board of Trade 
The Board of publish the recorded quantity and value of goods 
Trade indax. imported and exported, and the average prices of 
these goods can be calculated. Those commodities are selected 
which occur in the returns for the whole period chosen. A 
particular year is chosen as base ; then the goods are valued 
in all other years separately at their prices in the base year; 
the total of these values in any year is the sum which the 
goods would have been worth if their prices had remained 
unchanged ; the ratio of this value to that actually recorded 
is the ratio of their average price in the base year to their 
average price in the other year selected (if the term average 
is used broadly), and if the first term of this ratio is equated 
to loo, the second term is the index-number required for the 
year selected, expressed as a percentage of the number for the 
base year. It is at once evident that we are here dealing with 
weighted averages. 

Let /i, /2» A • • • ^ ^^^ prices in the base year of units 
of the goods selected, and r^p^^ r^p^^ r^p^ . . . the prices in 

systemB the year for which we require an index-number: 
of weights, then ^1, ^2* ^8 • • • nieasure the changes of prices 
for the separate commodities, and these r's are the samples 
from which we are to deduce the general change of price. 
The weights used in the process described may be found 
thus: let b^^ d^y ^s ... be the numbers of units of goods in 
the selected year ; then the total value in the selected year 
at the prices of that year is (^1^1/1+^2^2^+ • • •)> ^^^ ^^ the 
prices of the base year is (^1/1+^2/2+ • • Oi the ratio is 
^6rp : 2^/, and the index - number for the selected year is 

2drp / dp\ 

Here the weights applied to the t^s are the values which the 
corresponding goods in the selected year would have borne at 
the prices of the base year. It is clear that the selection of the 
standard year affects the weights, for any particular commodity 
can be given special weight by choosing as base a year in which 
its price is high, and much trouble has been spent in searching 



INDEX-NUMBERS. 



221 



for a " normal " year ; but though the weights of separate com- 
modities are affected, it does not follow that the average will 
be altered, and we should expect from the principle laid down 
above that the change would be very slight In fact we have 
the following figures : — 



INDEX NUMBERS OF 1886 AND 1883 COMPARED.* 


Imports. 


Exports. 


Weights. . 


Values at 

Z873 
Pnces. 


Values at 

1883 

Prices. 


Values at 

z86z 

Prices. 


Values at 

z88z 

Prices. 


Values at 
Prices. 


Values at 

Z883 

Prices. 


Values at 

z86z 

Prices. 


Values 
at z88z 
Prices. 


1883 
1886 


100 
81.7 


100 
82.1 


100 
82.9 


100 
82.3 


ICO 

88i 


100 
88 


ICO 

87 

1 


100 

89 



It is possible to produce figures which show a variation 
caused by a change of base year, but it is done by choosing 
samples which lend themselves to the special argument. 

Since so great an alteration in choice of weights makes so 
little difference, it is worth while to see if we need even keep 
the weight due to the quantities imported (the ^'s in the above 
formulae). The following table may be quoted f to show that 
these weights even have practically no influence : — 

Index- Numbers for 1895, ^>^« thai of iZ^i is 100, obtained by 

Various Systems of Weighting, 





Ratios of Prices (r^, rg . . .) 


Reciprocal 
of A.M. 

«f X z 
of — > — 1 

&c 


Economises 
Figures. 


Weighted 
by Values 

of Z895 

Quantities 

at ]88z 

Prices. 


Weighted 

by Declared 

values in 

z88z. 


Arithmetic 
Mean. 


Median. 


Geometric 
Mean. 


Imports 
Exports 


67i 
83 


69 

87 


734 
82 


72i 

81 


72J 

78i 


69 

75 


}'■ 



* From the Economic Journal and the Statistical Journal^ both June 
1897. 

t From the Economic Journal (with a correction in the statement of 
weights). 



I 



» • 



222 ELEMENTS OF STATISTICS. 

In the first column of figures the goods of 1895 (6^, b^ b^ 
units at prices /j^, p^^ p^ . . .) are valued at the prices of 
similar goods in 1881 (/^ /g* A • • •)• The ratio of their 

new to their old value -^-^ ^"^(^'v^i) *^ ^^ ratio of the new 

index - number to 100. In the next column the index- 
number is obtained by valuing the quantities of 1881 at the 
prices of 1895 ; then the ratio of the new value to the old 

V^" (w^^*"^ ^i> ^2 • • • ^^^ quantities in i88i) = lV.^^j is the 

ratio of the index-number of 1895 to 100 for 1881. In the 
next three columns the arithmetic mean, the median, and the 
geometric mean of the ^s are given. In the last column but one the 

arithmetic mean of — , — . . . , that is of the ratios of the 

prices of 1881 to 1895, is calculated, and the ratio of this 
mean to 100 equals the ratio of 100 to a new index-number, 
which corresponds to the former arithmetic mean with the 
years 1881 and 1895 interchanged. The figure in the last 
column is calculated from material given in the Economist ;\ 
every year the imports and exports are valued at their prices 
in the previous year, and thus an annual ratio is given similar 
to that in the first column of figures in the table just given ; 
the number 100, taken for 1881, is multiplied by this annual 
ratio year by year till 1895, and the number 71 is the result 
[Algebraically this index-number is — 



100 X 2 



('■• ^) '^ K^'-^^i) '^ ^ -1 



A more complete analysis of these figures, and an investiga- 
tion as to the causes of the divergence between the export 
indices 87 and 75, would show which of the methods should be 
adopted. Here we will be content with noticing that the 
unweighted average, 82, is very near the first weighted 
average, 83. 

Further methods of dealing with such weights are given on 
p. 226, under Retail Index-Numbers. 



* Where ri=^, r„=^, &c. 

Px /« 

t See, for instance, the quotation from the Economist in the Statistical 

Journal^ i9oo> p. 129. 



INDEX-NUMBERS. 223 

The advantage of index - numbers on the Board of Trade 
basis is that they measure approximately an objective quantity, 

Objective and a result is obtained which can be stated in 

measure. terms which appeal to the ordinary man who is 
not a statistician: such as, "The imports of 1895 would have 
cost half as much again if their prices had been those of 1881 ;" 
but, as pointed out above, it does not follow that this index 
is the best measure of the less-definable quantity, " Fall in the 
price of imports," where we imagine a general cause affecting 
this class of commodities whose action is modified by other 
partial causes. 

A special advantage of the geometric mean* is that the 
results it gives are independent of the year chosen as base ; for 

Geometrio ^^ Pv P2? ' ' ' A and p^y p} , . . . p^- are the prices 

mean. {„ ^^^ years, VA/2 • • • A ''''JP1P2 • • • A^ • • 100 • Ip 

the required index-number ; hence — 

A 



l2=IOO 



' SI Px Pi 

V A /2 A V P^ /, 



A 

I 

= 100 



Xl X 7^ A" 



which would be the value obtained for the third year if only 
the second and third were considered. Considering the extra 
labour involved in calculating this mean, and the small advantage 
obtained by any alteration in the weighting, its use is not to 
be generally recommended.^ 

Mr Sauerbeck and the Economist both avoid in part the 

difficulty of weighting the separate ratios by their relative im- 

other Index- portance in consumption, by selecting from those 

numbers. commodities whose prices are most accurately 
determined more instances of such widely consumed articles 
as wheat than of less important commodities such as linseed. 
Mr Sauerbeck has, in his annual articles in the Journal of the 
Royal Statistical Society^ verified the correspondence of the un- 
weighted average of his 56 ratios with the average of the same 
weighted on various principles. 

* Pointed out by Professor Edgeworth. 

t On this point, and on others in this chapter, see article Index- 
Numbers, in Palgrave's Dictionary of Political Economy, 



224 ELEMENTS OF STATISTICS. 

While the choice of the special weights to be employed is, 
when the number of ratios taken is at all considerable, quite 
importanoa of unimportant, the choice of the quantities dealt 
right ohoioe with has great effect on the result Thus import 
of sampiM. figures, relating to raw materials and the produce 
of other countries, do not lead to the same index-numbers as 
export figures dealing with the price of our own produce, 
though the tables just given show that they are little affected by 
weights ; and neither of these agree closely with Mr Sauerbeck's 
or the Economists numbers, and these again are not in complete 
agreement. The samples on which these four sets of numbers 
are based are from different groups of commodities, and the 
numbers show that the same forces do not affect these groups 
in the same degree. When we have so multiplied our samples, 
that we can subdivide them without affecting the index-numbers 
deduced, we may expect our results to represent the required 
measurement* 

If we compare the Economist index-numbers with Sauer- 
beck's during the period 1860-70, we see that the former show 
Great advantage a very much greater increase during the cotton 
of the median, famine than the latter. An index-number which 
can be greatly disturbed by fluctuations, however violent, in 
only one group of commodities, is clearly wanting in some of 
the chief qualities of a general measure of price levels. A very 
simple means of avoiding this difficulty, and indeed all the 
intricacies of weighing, is to take the median of all the price 
ratios of a particular year as the index-number of that year. 
It is perhaps impossible to show theoretically that any other 
average satisfies the required conditions better than the median, 
and there can be no doubt that it is practically the easiest to 
calculate. 

If, on the other hand, paucity of data makes the inclusion of 
weights necessary, and the popular desire for concrete measure- 
Proposed ments makes a fine show of weighting expedient, 
standard, ^g perhaps cannot do better than to adopt the 
standard proposed by the Committee of the British Association, 
already mentioned, for the construction of an index-number, 



* Mr Sauerbeck's numbers are to be found in annual articles by him in 
the Statistical Journal ; and a diagram showing them from 1820 is pub- 
lished by Effingham Wilson (is.). 



INDEX-NUMBERS. 



225 



which might be the basis of business transactions involving 
future payments. This standard is as follows : — 

Basis of Index- Number recommended by the Committee appointed by 
the Economic Section of the British Association^ 1888. 



Articles. 



Wheat - 

Barley 

Oats 

Potatoes, rice, &c. 

Meat 

Fish • - - 

Cheese, butter, milk 

Sugar 

Tea 

Beer 

Spirits 

Wine 

Tobacco 

Cotton 

Wool 

Silk 

Leather 

Coal 

Iron 

Copper 

Lead, zinc, tin 

Timber 

Petroleum 

Indigo 

Flax and linseed 

Palm oil 

Caoutchoux - 



Estimated 
Expenditure 

per Annum 
on each. 
000,000's 
omitted. 



;f60 

30 
50 

20 
60 

30 
20 

40 
10 
10 
20 

30 
20 
10 

50 
25 
25 
30 

5 

5 
10 

5 
5 



Hence 
Weights 
assigned. 



2j 

74 



•20 



'20 



10 



10 



Prices to be taken from 



Gazette average, English wheat. 
n » barley. 

// tt oats. 

Av. import price, » potatoes. 

Market quotations, live meat, 
Smithfield. 

Board of Trade Returns; aver- 
age per cwt. landed. 

Cheese and butter, average im- 
port price. 

Av. import price, refined sugar. 



m 
It 

n 





It 

export 
import 










export 

















tea. 

beer. 

spirits. 

wine. 

tobacco. 

cotton. 

wool. 

raw silk. 

hides. 

coal. 



Market price, Scotch pig-iron. 
Av. import price, copper ore. 

It tt lead ore. 

Average import price. 




» 

w 
m 






w 
9 
9 







9 




Since we can only obtain rough correspondence in dealing 
with wholesale prices, we cannot expect to be able to measure 
Retail prioe retail prices with any great precision. For we saw 
index. jj^ ^^ preceding chapter that the error in an aver- 
age bears a definite relation to the errors in the items which 
compose it ; if the errors in the items are on the whole doubled, 
it is likely that the errors in the average and in the ratio of two 
averages will also be doubled, and we shall need four times ♦ as 
many samples to restore the precision. Unfortunately the 

* See p, 305, infra^ 
P 



226 ELEMENTS OF STATISTICS. 

material for computing a retail index-number is even more incom- 
plete than that for wholesale prices, and owing to the smaller 
number of articles that can be included, and the preponderance 
of such items as bread and rent, the question of weighting 
becomes of more importance. 

When we wish to construct an index-number to show the 
purchasing power of money to special classes, we must take 
Special into account some considerations which can be 
difflonities. ignored when dealing with wholesale price num- 
bers. Different classes of persons at the same time, and the 
same classes at different times, spend their income in varying 
proportions on different objects. If we could collect enough 
sufficiently accurate samples, this fact would not matter so much ; 
but it would still be of some importance owing to the tendency 
to make increased purchases of cheapening commodities. As it 
is, it would be necessary to construct separate index-numbers for 
each class and each district. The difficulty of insufficient and 
inaccurate data cannot at present be overcome ; but as it is pos- 
sible that we may in the future get definite records of retail 
prices sufficiently numerous to make up for their want of pre- 
cision, we may glance at the other details of the problem. To 
form an index-number for a particular class of people, we need 
records of the method of expenditure of their income at all 
the dates in question, of sufficient numbers to obtain the slight 
precision which weighting needs. Then if we had fairly good 
Methods of records of retail prices several methods of weight- 
weighting, jj^g g^j.g open to us,* all of which are likely to give 

nearly the same result. The necessity of weighting and the 
methods are best shown by a numerical illustration. 
Suppose the following records of expenditure : — 



First Year. s, d, 

6 quarterns bread at 6d. 3 o 
4 lbs. meat at yd. - 2 4 

J lb. tea at 3s. - -16 



6 10 



Second Year, s, d, 

7 quarterns at 5d. - 2 11 

5 lbs. at 8d. - - 3 4 

i^ lbs. at IS. 4d. -20 



8 3 



The second year's budget at the first year's prices would cost 
I OS. I id. ; index-number of retail prices on this basis — 

100 x-?^ = 75.6 («). 

los. lid. ' -^ ^ ^ 

* See article on Wages, Nominal and Real, in Palgrave's Dictionary oj 
Political Economy^ pp. 640-641. 



INDEX-NUMBERS. 22/ 

The first year's budget at the second year's prices would cost 
5s. lod. ; index-number on this basis loox ^^' '^ "^5*4 (^)' 

The ratio of the costs of 6J quarterns, 4J lbs. meat, i lb. tea, 
the averages of the quantities in the two years, at the two sets 
of prices, is 8s. lojd. to 7s. oJd.= 100 : 73.2 (c). 

If we disregard the records of changing expenditure, we 
find that the unweighted average of the three ratios of prices is 
100 : 80.8 (rf). 

If we suppose the second sum (8s. 3d.) to be spent on bread, 
meat, and tea in the same proportional parts as the first sum 
(6s. lod.), we have — 

^ ^, of 8s. 3d., t.e,y i^d. would have bought '^ quarterns at 
the second price, which would have cost ^^ ^ x 6d. at first price. 

Working out the other items in a similar way, we find that 
the second sum distributed in the same proportions as the first 
would have bought goods which would have cost ids. lojd. at 
the first prices ; and the resulting index-number is — 

^^'^^- of ioo = 62.8(£?). 

los. loid. ^ ^ 

This reduction is due to the large expenditure on bread on 
this hypothesis, which can easily be shown to be an unreasonable 
one if we suppose the price of bread to be reduced to nothing, 
while the other prices rise ; then on this hypothesis the fall is 
infinite. 

Of the above numbers, (c), {d), and (e) do not seem to rest on 
sound hypotheses ; (a) clearly overstates and {b) clearly under- 
states the fall ; and therefore some number between {a) and {b) 
is the number required. If {a) and (Jb) lie close together there 
is no further difficulty; if they differ by much they may be 
regarded as inferior and superior limits of the index-number, 
which may be estimated as their arithmetic mean (80.5) as a first 
approximation. 

While it is useful to have a definite means of calculating 
these numbers to bring extravagant statements to a numerical 

Further test, there are two further considerations which 

difflouitiM. hinder the complete solution of the problem. In 
all budgets rent is an important item, and there seems no 
prospect of obtaining any good estimate of the relation between 
increasing rent and improving accommodation, allowing for the 



228 ELEMENTS OF STATISTICS. 

benefits of public expenditure paid by rates included in rent 
Again, if we consider, not how money is spent, but how it might 
be spent, we should have to introduce a more general factor ; 
for the margin which remains when necessities are satisfied has 
a rapidly growing purchasing power, as the products of machinery 
increase in variety and diminish in price ; perhaps the calculated 
fall in wholesale prices forms a fair measure of this growth. 

Leaving this somewhat unfruitful topic, let us return for a 
moment to the measurement of a quantity more typical of 
index-numbers.* If we have to measure the action of a cause, 
indez-mimbera which affects quantities which have no common 
of oonsnmption. measure, we are still able to apply index-numbers. 
A general increase has taken place in the consumption of 
imported goods, and if we can measure this increase indepen- 
dently of any change in price, we can use it as an argument to 
support the alleged increase in real wages. The only common 
measure of bread, currants, cheese, meat, &c., of practical value is 
their price, their weight being useless for the purpose. If the 
quantities consumed year by year of a number of such com- 
modities are written down, expressed as percentages of the con- 
sumption in any years (not necessarily the same), we have series 
of numbers which only need weighting to form the index- 
number required. We can in this case verify, that any logical 
choice of weights, based on their value or their assumed im- 
portance, or even a random system of weights, gives much the 
same index-number as the simple arithmetic averages; in fact, we 
have a sufficiently good group of samples to render us nearly 
independent of weights. When this is the case we can say with 
safety that the number required lies in the neighbourhood of the 
group given by the various systems of weights, and choose what 
appears the most logical system for the estimate we adopt. In 
the paper referred to, five different systems applied to only 
fourteen commodities give results for the increase of consump- 
tion all between 13.8 and 20.1 per cent, in the period 1873-96. 

The application of index-numbers to wage statistics does not 
involve any fresh principles. It is not permissible to ignore 
Wage index, weights in this case; for an unweighted average 
numbers. • y^Quld not allow for the general tendency to in- 
crease numbers where wages are rising. There is great liability 

* The following illustration is based on Mr G. H. Wood's paper on 
Some Statistics of Working Class Progress^ cited above. 



INDEX-NUMBERS. 229 

to " biassed " errors in separate averages ; for wages for over- 
time, specially high piece-wages, wages of large uncombined 
classes of low-skilled or badly paid workpeople, may often be 
omitted in wage records. These biassed errors, however, tend to 
disappear in comparison ; and it may prove possible to construct 
a wage index-number of very fair precision. 

Note, — Pages 226-7 have been misunderstood. The method has been 
confused with the measurement of the cost of a standard budget (represent- 
ing either minimum subsistence, or efficiency subsistence) with the items 
the same at both dates. The cost of such a budget is very important in 
relation to working-class progress, but in the text we are concerned with the 
change of the purchasing power of money, expended on retail commodities, 
which makes a different problem. Of course the three commodities used in 
the test form only an abbreviation for a dozen or a score of items. 

The methods (c) and (d) are ignored because they do not allow for the 
correlation between distribution of expenditure and price ; and {e) seems 
contrary to experience, and is liable to great disturbance if a single article 
becomes very cheap or very dear. These methods have their use in other 
measurements by index-numbers. 



CHAPTER X. 
INTERPOLATION. 



CHAPTER X. 
INTERPOL A TION. 

I. General. 

It is very often the case in practical statistics that we are not 
able to make serial estimates as frequent or descriptions of 
Necessity of groups as detailed, as is necessary for their use in 
Interpolation, further investigations. Thus the population is 
only counted once in ten years ; but we need to bring monthly 
and annual accounts — births, deaths, trade returns, &c. — into 
close relation to the existing number of people, and estimates 
for the budget and the yield of taxes must be based on the 
assumed number of taxpayers for the current, year; it is 
therefore necessary to interpolate estimates for the number of 
the people in intercensal years. Again, interpolation is needed 
for the statement of the distribution of the population according 
to age, a tabulation which is necessary for actuarial work and 
for sociological purposes. The ages returned on the house- 
holder's schedule are nominally correct to the year, but in 
practice they are known to be inaccurate, tending to group 
themselves in the neighbourhood of round numbers ; but the 
returns for such age periods at 35-45 years are more correct, since 
the persons who return themselves as 40 years old are probably 
within 5 years of that age. The original returns are so erroneous 
that they are not published at all, but the numbers are only 
given in the ten-yearly periods ; from the numbers so given, it 
is necessary to estimate the numbers for the individual years. 
Again, the compilers of the wage census of 1886-91 enumerate 
the numbers earning wages "of 15s. and under 20s.," "of 
20s. and under 25s.," and so on, but not the numbers in 
shilling limits. In problems relating to wages we often need 
more detail ; and when we are comparing these wages with a 
similar group in France, we must devise a scheme by which 
grades of 2 francs can be compared with grades of 5 s., by a 
suitable system of interpolation. Such a necessity is very 



234 



ELEMENTS OF STATISTICS. 



common when we wish to compare groups, which are similar 

but tabulated on diverse systems. Thus, two countries conduct 
their census at different dates. In one country the age groups 
are of fifteen years, in another of ten ; in one, ** young persons " 
are those under 21 ; in another, those under 18. Occasional 
estimates seldom correspond in date ; wage statistics are found 
for 1840, 1850, and 1892 in France, and for 1866, 1885, 1886, 
and 1 891 in England. Similar differences are found when we 
are comparing county with county ; and a discussion of the 
method of determining averages in such a case will illustrate 
some of the elementary problems of interpolation. 

Suppose that the figures printed in Roman type in the 
Biemontary following table are accurate returns of the weekly 

example. wages in three districts, and that we wish to find 
the average change in the three together. 



Years. 


i860. 


1862. 


1864. 


1866. 


Z870. 


1871. 


1875. 


1878. 


1880. 


1881. 




s. d. 


s. d. 


s. d. 


s. d. 


s. d. 


*. d. 


s. d. 


s, d. 


s. d. 


s. d» 


District A 


12 6 


15 


'S 


15 


IS 


14 6 


18 


18 


17 6 


17 


B 


18 


ig 


19 


20 


20 


ig 6 


21 


21 


20 6 


20 


C 

Average 


10 


II 


II 


12 


12 


12 


15 


'S 


15 


14 6 


13 ^ 


'5 


15 


IS 8 


'S s 


^5 4 


18 


18 


17 8 


17 2 



It is clear that there is something to be learnt about the 
general course of wages from the data, but the lessons are not 
obvious. The following figures, printed in the table in italics, 
are those which naturally suggest themselves. There is no sign 
in A of any change between 1862 and 1866, so we write ijs. for 
1864. Judging from B, the figure for 1870 is not likely to have 
been lower than that for 1864, so we write i^s, for A in 1870. 
A is now complete ; we notice that in A the first rise was com- 
plete by 1862, and assuming the same in B, we obtain igs. for 
1862. In C there is a rise between 1864 and 1866, while in A 
there is no change from 1866 to 1870 ; B will correspond if we 
write 20s, in 1866. If we write for B, igs, 6d. in 1871, 21s, in 
1875, and 20s, 6d, in 1880, we shall have close correspondence 
with A from 1866 to 1881. Similar reasons lead to the numbers 
interpolated for C. The unweighted average can then be cal- 



GENERAL. 2$S 

culated year by year, which could not be done directly from the 
data. This average reflects all the changes in the original figures 
and gives no special predominance to any. It may be regarded 
as the most probable series that can be based on the given 
information. 

We will now notice the assumptions tacitly made in pror 
ceeding by this method. First, it has been assumed that there 
AsBumptions are no sudden jumps, that such a figure as 20s. 
made. f^j. ^ jgg^ jg inadmissible ; this is only justifiable 
if we are acquainted with the general causes which influence 
the rate of wages, and know that there was no violent disturb- 
ance in the intermediate dates. We could not make this 
assumption as to wages in the cotton trade in the time of the 
American Civil Wars, nor can we make it over a long series 
of years. Secondly, it has been assumed that in the absence 
of evidence to the contrary the rise or fall has been uniform. 
Thus, in B 1878-81, the wage in 1880 is assumed to be inter- 
mediate between 1878 and 1881 ; if there had been no indica- 
tion from A that it was half-way between in point of wages, 
it might have been said that in point of time it was two-thirds 
of the way, and 20s. 8d. should be interpolated for 1879 and 
20s. 4d. for 1880, if it was worth while to depart from round 
numbers. Thirdly, it has been assumed that the course of 
wages in the three districts was similar. Thus in A there is 
a rise from 1860-62, but there is no further improvement at 
any rate before 1866; it is consequently assumed that the rise 
registered in B and C before 1864 actually took place before 
1862. Again, when considering the period 1870-75, we notice 
that in A there is a fall till 1871, and a sharp rise to 1875, and 
no change to 1878 ; in B, therefore, it is assumed that the wage 
of 1875 is equal to that of 1878, and the fall in 1878 may be 
allowed because it increases the sharpness of the rise in 1871-75. 
In C it is doubtful whether the 12s. in 1871 should not rather 
be IIS. 6d. The reasons against are that a gain on a low wage 
is often not so easily lost as a gain on a high one ; 6d. is a larger 
drop proportionately on 12s. than on 15s. ; that the rise of 3s. 6d. 
which would then be shown 1871-75 is a larger proportionate 
rise than in either A or B ; and that the existence of the fall in 
1870-71 depends only on the evidence of a fall between 1866-71. 
When the figures are few in number, it is necessary to examine 
them in this way to pick out the most probable ; and it is often 



236 ELEMENTS OF STATISTICS. 

fairly easy to fill in the figures which satisfy all the existing 
evidence fairly closely. 

The question at once arises, What certainty have we that 
these quantities, by hypothesis unknown, are in reality anywhere 
near the figures which on the face are most probable ? 

In some cases of interpolation, dealt with presently, the 
answer can be given as a statement of mathematical proba- 

bility, such as : it is 2 to i against a divergence 
of 6d. from the assigned figure, 30 to i against 
one of IS., 1,000 to i against one of 2s. 6d., and so on ; but 
in the figures most often cropping up in investigations it is 
not possible to assign such a precise probability. There is 
one rough but useful way of testing the accuracy of such 
interpolation as in the case before us which can be explained 
by an example. Test how far we can throw out our calculated 
average for 1870, without violently infringing the common-sense 
of the question. Make A and C as large as possible in these 
dates ; we may perhaps suppose a rise of is. above 1866, seeing 
that there is one in B between 1864 and 1870. We can hardly 
suppose either that 1870 is as high as 1875-78, or that there is 
a great drop of as much as 2s. in the single year, if we are 
acquainted with the causes that determine the wages at those 
dates. Let the highest wage we can assign to A and B be 
1 6s. 6d. and 13s. 6d. respectively. Our average is then i6s. 8d. 
instead of 15s. 8d. Similarly, we might perhaps think that 
14s. and IIS. were the lowest possible in A and C in 1870; 
then the average would be 153. Assuming that we know enough 
about the general trend of erents at these dates to assign limits 
in this way, we can say it appears improbable that the average 
wage in 1870 was less than 15s. or more than i6s. 8d., and that 
the evidence points to 153. 8d. 

The accuracy of our interpolation then depends — (i) On 
knowledge of the possible fluctuations of the figures, to be 
obtained by a general inspection of the fluctuations at dates 
for which they are given ; (2) on knowledge of the course of 
the events with which the figures are connected. 

Numerical A second example of a similar kind* may be 

example. given to illustrate the numerical calculation. 

* Taken from Agricultural Wages in England^ in the Statistical Journal^ 
December 1898, by the present author. 



GENERAL. 237 

Northern Counties. Weekly Agricultural Wages in 

x867-69. 1869-70. 

s, dm s. d» 



Gieshire 

Lancashire - 

West Riding of Yorkshire 

East „ „ 

North „ „ 

Durham 

Northumberland - 

Cumberland 

Westmoreland 



/J / 13 ^ 

15 o 15 o 
14 6 16 5 

14 6 14 I J 

14 6 15 4 

16 6 16 o 
16 6 16 7 
14 4 14 9 



IS 7 16 ' 

Roman figures given. Italic figures interpolated. 

The averages of the wages in the five districts for which 
data exist in both periods are iss. 4.8d. in 1867-69 and iss. io.4d. 
in 1869-70, that is in the ratio 33:34. If we assume that the 
wages in the other counties have been influenced by similar 
causes and increased in the same ratio, we obtain the figures 
interpolated in the table. The unweighted averages for the 
northern counties are now 14s. iid. and 155. 5d. in the two 
periods, instead of 15s. 3d. and 15s. 5d., the averages of the 
given numbers. For general comparison all over England 
between these two years we should have been obliged to neglect 
the missing counties in both years, which would have unfairly 
lowered the general average, since these counties have in recent 
times had wages above the English average though below that 
of the northern district. At the same time we should have 
unfairly raised the apparent average of the northern district. 
We should also have lost the probable figures for the special 
counties at the earlier date which are on a fairly safe basis ; 
for the wages in these counties of the Northern District remain 
in nearly the same order through the last fifty years. At the 
same time it is easily seen that these wages are not so accurately 
known as those not interpolated, and it is well to notice in 
arguments based on such figures, to what extent the interpolated 
figures are involved. 

A process very similar to that just employed is u§ed in 
giving marks at school to students who are absent from a lesson ; 
attention is paid both to the particular student's general place 
in the class order, and to the average value of the marks obtained 
by the rest of the class in the lesson missed. 

Though the method be fairly complete it is very important 



238 ELEMENTS OF STATISTICS. 

to notice that interpolated figures rest on quite a different class 
of evidence to those which are the result of direct 
evidence. In some cases they may represent 
""te™*"* quantities which have no existence (as in the case 
of school marks) and which are only used for con- 
venience of calculation. In others they are simply figures 
adopted as those which in default of definite knowledge appear 
most probable. They must always be clearly indicated as inter- 
polations ; it is always well to state the method by which they 
are obtained, and any subsidiary information which may be re- 
garded as direct evidence of their accuracy, and if practicable 
they may be given not as exact, but as lying between certain 
limits ; thus the interpolated figures for Cheshire might be 
written 12S. 6d. to ijs. 6d., instead of ijs. id. 

Several different cases are met with in interpolation, some of 
which are treated algebraically in the next section, while others 
can be illustrated at once by numerical examples. 

The Graphic Method. — If we know the values of quan- 
tities at isolated positions, such as the numbers of the population 
oraphio at the ages 25 to 35, 35 to 45, &c. ; the population in 
nuiuiod. 187, iggi^ iSgi.&c; wages in 1S60, 1870, 1873, 
&c.; the numbers whose wages are from ijs. to 20s., 20s. to 2Ss,, 
&c., we may represent the facts by such a diagram as — 



Years i860 1866 1S70 1877 1880 1884 

Suppose that we need the value of the quantity in 1875. If 
we were only given the two points c and D, the simplest 
hypothesis, and the one to be made in the absence of any 
evidence to the contrary, is that the quantity increased uniformly 
between C and D ; representing such an increase by the straight 
line c D, the height of the point x will represent the quantity 
in 1875. 



GENERAL. 239 

If the point E is also given, the hypothesis represented by 
the straight lines c D, D E will not stand, for it assumes a sudden 
break in the regularity at the point D in 1877, for which there 
is no evidence. We must take into account alt the points given, 
and through them all a line must be drawn whose curvature is 
as smooth as possible, for in the absence of evidence to the 
contrary, sudden changes in the quantities may be assumed not 
to exist. Such a curve can be constructed on mathematical 
principles, or may be drawn freehand ; if the latter, it will often 
be quite as near the facts as the argument will allow us to go. 

This method only applies to continuous quantities, such as 
numbers at different ages, population at different dates, earners 
at different wages in a very large group of wages. Thus for all 
England the average wage must change gradually, but the wage 
of the txindon builders changed suddenly as the result of 
strikes and arrangements at certain dates. In this case we 
must draw the figure to correspond as closely as possible to 
the evidence, such as — 



where a b represents a sudden rise ; B c a gradually accelerated 
increase due to improving trade, c D a slow falling off from 
the wage reached at C, and D E a determined and successful 
effort to recover the lost ground. 

Periodic Figures. — If we know the annual averages of 
figures which have a yearly period and a sufficient number of 
monthly averages to estimate the periodic fluctuations by the 
method described on pp. 182-7, we can interpolate figures for any 
month for which the returns are incomplete with fair accuracy. 
Thus if we are dealing with the numbers of unemployed as given 
in the Labour Gaselte, we find a periodicity which is not very 
strongly marked in all the months, but there is in general a fall 



240 ELEMENTS OF STATISTICS. 

in the spring and a rise in the late autumn, and June is generally 
the minimum month. We can then make use of the small 
diagrams on p. 184, and, having marked in all the information 
we have, draw the waves on the rising, stationary, or descending 
line of averages, so that the fluctuating lines shall pass through 
all the given points. We can obtain an idea of the accuracy 
of the resulting figures by noticing the general characteristics 
of the given figures ; we find that the percentage unemployed 
has never changed more than two units in one month, that 
there are no fluctuations which have lasted less than three or 
four months, and that the percentages have never been below 
I or above 10. Finally, we can look at the trade history of 
particular dates, and in the light we thus obtain reject any 
improbable figures. 

Use of Subsidiary Curves. — If we are able, by the 
methods described in Chapter VII., Sect. III., or Sect. V., 
to find a close connection between two series, we can use the 
more complete of them to assist the interpolation of any missing 
figures in the other. We must first investigate carefully the close- 
ness and nature of correspondence at the dates for which we 
have complete figures in both series. Then we can draw dia- 
grams, similar to those facing p. 175, one of the lines being 
incomplete. Then completing the broken line, so as to bring 
it into as close resemblance with the completed line as the given 
points allow, we shall obtain the most probable values for the 
missing figures. The accuracy of the result can 1^ tested as 
in the previous case. This method may reasonably be used 
in interpolating figures for the yield from one source of revenue 
by means of the yield from another ; for the value of exports 
from that of imports ; for the marriage rate from foreign trade ; 
for the wages in one district from those in another ; for the 
number of unemployed from the changes in consumption of 
foods ; for changes in parts of the population, when we know the 
changes in the whole, and for many other series. 

General. — Series of figures may be classed in three groups — 

(i) Periodic ; (2) symptomatic, where there is a general tendency 

General dassi- towards increase (as in serial wage statistics) or 

fioation of series, decrease (as in the English birth rate in recent 

decades) ; (3) those which have no period and no symptom, but 

pnly apparently random fluctuations. 

To interpolate figures in series of the third group it is neces- 



GENERAL i4t 

sary to obtain a measure of the fluctuations, by the theorems 
of Part II., and then we can assign the mathematical proba- 
bility of the various numbers possible. In series of the second 
group, we must pay attention in addition to the symptomatic 
tendency. The necessity for interpolation of this kind does 
not, however, arise frequently, so we will not offer detailed 
illustrations of it 



242 ELEMENTS OF STATISTICS. 



Section 2. — Algebraic Treatment. 

The problem of interpolation to which most attention has been 
given may be stated as follows : — When one quantity is subject 
to continuous regular change, and a second quantity changes in 
connection with it, and we know or can estimate directly only 
some discontinuous values of this second quantity, it is required 
to find the probable values of the second quantity which corre- 
spond to given values of the first : for instance, given the expec- 
tation of life at the ages 15, 20, 25, &c., it is required to find it for 
intermediate ages ; given the population of the country in 1871, 
1 88 1, 1 89 1, 1 90 1, find it at intermediate dates. The only per- 
missible assumptions are that the quantity changes continuously, 
that is with no breaks at any figure, and that the rate of change 
of the quantity is also continuous, that is that the line represent- 
ing its value is not angular, but smooth. This problem differs 
from those just discussed, in that there is likely to be a law 
binding such figures together, whereas in the former cases the 
consecution was apparently random. 

It is necessary to divide the problem into two classes : — 

A. Where the given values may be assumed to be accurate ; 

B. Where the given values are liable to correction. 

A. Some preliminary algebra is necessary; it is derived 
principally from Boole's Finite Differences and De Morgan's 
Differential Calculus^ to which authorities readers may be referred 
for more detailed treatment. 

I. Let ^ be a continuous function of jr, and let j/o. J^i>^2 • • • 
be any values of ^. 

Let ^}, Aji, Agi . . . be written for y^ -y^, y<i-yv yz'y^^ • • • 
A2 A2 A2 Ai-Ai Ai-Ai Ai-Ai 

A^A^AS A2_A2A2_A2A2_A2 

^-'o I "1 > *-*2 • • • »> "1 "o > *-*2 "1 > ^8 "2 > • • • 

and so on. 

A^ji, Aji, ... are called differences of the ist order; 

Ao^, Aj2j ... j^ j^ 2nd order, and so on. 

It is easy to show that \^ ^y^ - 2 j^ 4-jo> ^\=yz -^y% -^y^ ^c. 

^o^ = J's - zy^ + 3>'i - J'o, ^1^ ^y^ - zyz ^• zy^ -/i, &c. 



ALGEBRAIC TREATMENT. 243 

and generally that — 

Ao' =^r - ^-^r-i + ""TT" -J^r-a " + tO T+I termS - - (tt) 

/-(r-l) 



and 4'=J'r+8-^.J'r+s-i + —rr~ -^r+s-.- + to /-+ I terms - (/?) 



1.2 



the coefficients being those of the binomial expansion. Equations (a) 
and (P) are easily proved by induction. 

We can also express values of jk in terms of yo and the differences ; 

yi =yo + V, y2 =yi + ^i^ =^o + 2. V + V, yz =yo + 3-^0^ + 3 V + V, . . . 

and generally — 

y, =jVo + rA,' + ^^^717^^ + '-^^~T^~^'^<>'' + . . . + to 7TI terms (y) 

AJ = A J + rA'+' + "^^^^1^5+2 + ^fc^T— ^^•^o-''- + to M^i terms (8) 

X • Z X . Z. X 

^(''-0 A** . ^(^-0(^-2) 



Jr+s = J's + ^A^ + \ 3 « A32 + J ^ '. A38 + ... + to r + I terms (c) 

which formulae can be proved by induction, and in fact (7) is a special 
case both of (8) and (c). 

2. If we assume that^ can be expanded in ascending powers 
of ;tr, and is a rational function of the n^ order, we have — 

y = ao + a-^x + a^^-¥ . . . a„^, (0 

where a„ tf^ . . . a„ are constants. 

Suppose now that >'o» yu y^ - • . ^n are the values of y which cor- 
respond to values of x^ increasing in arithmetic progression, viz., 
•^M ^o + ^> 0:0 + 2^, . . . Xo + nh] then, on this assumption, we have 
« + I equations to determine the constants in equation (f). 

We can, however, write equation (f) in terms of the j;'s and ^'s 
without evaluating the constants j for 

+ . , . to « + 1 terms (17) 

is an equation of the n^ degree in x, and reduces to the identity (7) 
above, if we substitute {x^ + rh) and y^ where r is an integer not greater 
than «, for x and y. 

Hence equation {rj) is of the same order and satisfied by the same 
« + I pairs of values as equation (f), and is therefore identical with it. 

Again, if >'o» >'i, . . . y^ are values of y corresponding to any values 
of x^ viz., ^o> ^i» • • • ^w it can be shown that the equation 

{x-x^{x-x^ . . . {x-x^ ) {x-Xq) {x-x^)...{x-x^ 

■^"•^" (^0-^1) K-^2) (^o-^n) ^^^ (^1-^0) (^1-^2) (^1-^n) "*■'•• 

+JK,. /1'^ " ""^^ Z"" " ""^X "'^f' """"^ X - • Lagranges formula {0) 

\X^ — Xq) \Xj^ — ;:Vj^ . . . \X^ — ^n-1^ 



244 ELEMENTS OF STATISTICS. 

is equival ent to (Q, for it is of the same degree in x^ and b satisfied bj 
the same « + i pairs of values of (x^ y\ viz., {x^^ y^ (^Cj, j'l) . . . {x„ a)« 

3. Still assuming that an equation of the form of (^ satisfies 
the conditions, we can at once interpolate any values needed. 

Thus, if we are given that j^o* y^ y^ • • • A-i ^^ values corresponding 
to ;c=s I, 2, 3 ... « respectively, we can find y^ where s is fractional, 
by putting :ro= i, ^= i, ^= i + J in equation (1;). We obtain — 

y. = Jo + s,\^ + ^^^. A,2 + s ^^"'Vr'W + to « terms (0 
We can easily obtain similar formulae for any other intervals. 

4. Notice that, if yo^ ^1, . . . yn correspond to values of x 
(^o> ^\+K &cO in arithmetic progression — 

and Ji = ao + ai (^0 + ^) + . . . + an(^o + ^)° from (f); 
.-. A^i = dfj^ + . . . + a„{(^o + hf - V} 

= ^1^ + ... + ^n . nhx^"^ + terms of lower degree in x^ 
an equation of the n-i^ degree only. 

Continuing this process, we obtain — 

^o** = «ii' ^".«!, and there are no higher differences. 

Also, since (^o, ^o) ( Jn ^^ +^) • • • (a+» ^o + « + i.^) lie in a curve ot 
the «*^ degree, we have from equation (a) — 

A+x-(«+i). yo + ^ ^. JVn-i-^ — ^ -^ J'.-. 

1.2 1*2.3 

+ - to « + 2 terms = Aj+' = ^ - - - ■ C*^) 

5. If for any purpose we need to evaluate the constants in 
equation (f ), we can abbreviate the solution, as follows, if the x's 
are in arithmetic progression. 

Given five pairs of values, we have — 

y^-^a^-Va^ (x^^rh) + a^ {x^-^rhf + a^ {x^^-Kf + a^ (x^-^fhf 
y^-=-a^^ra^(x^^r^h) + a^{x^->t2Kf + ^3(^0 + 2^)^ + a^ix^^zh)^ 

^^4 = iio + «i (^o + 4^) + «2 (^o + 4^)2 + oJg (-^o + AhY + «4 (-^o + 4^)* 

A* 
As in the last paragraph A^* = ^z^.^* 4!, .-. a^ = — ^ - - - (A) 

It is easily seen that ^^ is independent of ^o, ^i and a^^ and that 

V = «s 1(^0 + 3'^)' - 3 K+2>^)3 + 3 (^o + >^)« - ^o^ + 
«4{ (-^o + 3^)*- 3 (^o + 2>4)* + 3(^0 + hy^x^^) = 6y^i73 + a^{ z^h^x^ + 36/4*} 

whence a« = =4 - A^*^— f — ^ ; 
* 6/^8 ^\6>^4^4W 



ALGEBRAIC TREATMENT. 245 

while A^2 = 2/^2.^2 + a^ (6/j2^^ + 6>^8) + ar^ (i 2^^ + 24k^x^+ i^h*), which 
gives a^. a^ and <7i can then be found from the first two equations. The 
points of inflexion on the curve, j' = « „ + ^i*^ + ^2^ + ^ir^ + ^4'*^> are 
determined by the equation — 

and the sign of -3^, ue, of a^ + 4^4^, decides the nature of the change 

of curvature. 

This method is employed on page 254 infra, 

6. In evaluating the constants it will be found that the 
following identities are sometimes useful : — 



fC - *Ci n - I + 'C2 « - 2 - + — the coefficient of of in the expansion 
of r\ {^ - 'Ci ^F"" + •Cg iF" - } 
t\e. of r\^' {e' -- ly 

i.e. of r\ (i+n-sx + ^^^ + j. x^(i+- + — +\ 

Coefficient is <?, when r = i, 2, 3 . . . j - 1. 
„ r!, when r — s. 



» 



(« + -j. H, when r = j+i. - - (/*) 



7. It is necessary to express the differential coefficients of 
^ with regard to x in equation (f ) in terms of the differences, 
and conversely. Now from a comparison of (() and (rj) we 
have, when Xo is zero — 

Writing y ^f {x) for equation (f), the equation just written gives, 
when x=>Xf, — Of 

hf\o) - A,i - 1 A.2 + J A,» - i A.« + 

Applying the same process again and again, and remembering that 
t^^ bears the same relation to t^} as A^^ bears to y^ and so on, we 
obtain, omitting the suffixes — 

>^/^(^) = (A-JA2+iA3-)«>, - - . (v) 

where the A*s are to be treated as ordinary algebraic quantities, till the 
exponent is removed. 

Thus h^P (<7) = A2 (i _ 1 A + ^A2 - )a 

= A2-A8 + |iA*- 
^/8(e?) = A»-.fA*+ &c. 



246 ELEMENTS OF STATISTICS. 

It IS to be noticed in particular that each derived function 
depends only on differences of as high an order as itself. 

Again, by Taylor's Theorem, 

yi =/(^o + A) =/K) + ^/H^o) + j/'M + 
yo =/(*o) 

y2 =/K + 2/1) =/K) + 2 V'K) + ^/^(^o) + 

.•- ^l '-y2 - 2^1 +yo = ^y^i^o) + >^«{ }. 

and, using equations (a) and (/*), or otherwise, generally 

Al==^/%x,) + /i^'{ } (o) 

8. We may now consider the assumptions made when we 
took {() to express the relation between y and x. 

liy and x are connected by any functional law, that is if ^ is 
determinate for all given values of ;r, without which assumption 
interpolation is meaningless, then^ can be expressed as a function 
of ;tr/ let j/=/(jr), then, by Maclaurin's theorem — 

y=^f{^) ^f{o) + x,p (o) + ^p (0) + ^/8 (O) + + 

2 3 

to an infinite number of terms. 

If/^'^Xo) and following coefficients are very small, and x is 
never large, the terms from the n + 2^^ onwards become negligible 

in comparison with earlier terms, so that the first n+i terms 
determine the value of^ approximately. Now by the equations 
(v) and (t?),/""*"' is small when A"+', A"+*, ... are small, and vice 
versa. Hence we have the following general statement: any 
functional relation between j/ and x reduces to the parabolic 
equation of the n^ degree (f), if the differences of orders higher 
than the n^ vanish, and if these differences do not vanish but are 
small, equation (f) is still an approximate expression for the 
relation. 

Now if the line drawn through the given points is to have 
continuous and slowly changing curvature, it is easily verified 
that the second differences for points near together are not large, 
for a rapid change in the rate of increase of the ordinate means 
a rapid change of curvature ; and if we construct a second curve 
with the same abscissas and the first differences as ordinates, 



ALGEBRAIC TREATMENT. 247 

small third differences will indicate absence of rapid change in 
the first, and so on ; but beyond this point it is not easy to 
see the connection between the hypothesis underlying inter- 
polation and the diminution of successive differences. The 
converse, however, is clearer ; if in any series of figures it is 
found experimentally that the successive differences tend to 
disappear, then any curve which passes through the points is 
expressed approximately by the parabolic equation. De Morgan 
states this conclusion thus : — " If we take n points near each 
other, and having their abscissae in arithmetic progression, with a 
small or at least not very large common difference, and their ordi- 

nates not very unequal . . . the parabola of the n-i^ order will 
very nearly coincide with any regular curve of the same general 
appearance, at least between the same points." Boole's explana- 
tion is : — ^** It is customary to assume for the general expression 
of the values under consideration a rational and integral function 
of jr, and to determine the constants by the given conditions. 
This assumption rests upon the supposition (a supposition, how- 
ever, actually verified in the case of all tabulated functions *) that 
the successive orders of differences rapidly diminish." 

Since, from equation (e?), when h is small, the successive 
differences for any curve diminish as their order becomes higher, 
it is a legitimate process to build up a series of values of any 
function on the hypothesis that the higher differences vanish. 

If a freehand curve is drawn so as to pass through the chosen 
fixed points, and to have curvature which changes as slowly as 
possible, a line will be obtained which lies very near that given 
by equation (f). Such a line would be similar to the track of a 
bicyclist who was riding so as to pass over several marks, or to 
just avoid several obstacles. 

9. It is clear from the above analysis that we can make a 
smooth continuous curve pass through any number of points we 
please ; for with the parabolic equation (f) there are never any 

sudden jumps in the values of ^, ^ or ^, as x changes con- 
tinuously ; and we can obtain as many linear equations (which 
have always real values) as there are constants, simply by taking 
n in the original equation to be the number of fixed points. 

$ dx. not statistical 
approximations. 



248 ELEMENTS OF STATISTICS. 

If we have, let us say, 10 points, as — 



andwish to find a point on a fixed vertical line between Fand G,we 
can either take only F and G into consideration, and, joining them 
by a straight line, obtain the point x^', or considering e, f, and G, 
or F, G, and H, draw parabolas and obtain jtr^ or x^ ; or considering 
E, F, 0, and H, draw a parabola of the third order, which would 
have a point of inflexion near f; this would be approximately the 
path a bicyclist might follow if he had to start from E, and ride 
to a near point II, passing close to F and G. If we now include 
D and K (if our bicyclist has to start from D, pass E, f, G, and H, 
and reach K) we shall modify the curvature throughout ; and as 
we include more and more points shall continue to affect slightly 
the path F G. If the inclusion of the nearer points tends to 
make the line f g approximate more and more closely to a 
final position, while the further inclusion of the more distant 
points throws it further away, we may conclude that the positions 
of these further points are not governed by the same numerical 
conditions as the nearer one. Thus in a "table of survivals" 
the figures for ages under 5 years are not distributed in accord- 
ance with the curve determined by the figures for higher ages ; 
in a table showing wages, it may be seen that those of highly 
paid workmen are not governed by the same causes as those 
lower in the scale. On the other hand, the number in each 
census is dependent on all the previous numbers for more than 
one generation. In interpolating for the population of 1876 we 
shall obtain different figures according as we include 1851, '61, 
'71, '81, '91 only, or 1901 as well ; and this is not surprising, for 
a mistake made in 1876 may not come to light till we have 
watched the growth of the population for twenty-five years. It 
is clear that the points far from the period in which the inter- 
polation is to be done cannot be allowed so much influence as 



ALGEBRAIC TREATMENT. 249 

those nearer, and it appears experimentally that this condition 
is fulfilled in the method discussed ; also, in series (rj) the suc- 
cessive coefficients begin to diminish with the f^ term where 
^<^o+(2r— 3)A, that is with the coefficient of the first differ- 
ence when X is between x^ and Xo+k. It may be noticed that 
the wanderings of the curve are limited by the condition that a 
curve of the n—i^ order cannot have more than n — ^ points of 

inflexion, for ^ has no term of a higher degree than jr"~'. 

In the above illustration the intermediate points from F to 
G might be found from the five points D, E, F, G, H, or from 
E, F, G, H, K. These two curves may be welded together be- 
tween F and G. The points near F are more accurately deter- 
mined by the first, of which it is the middle ; those near G by the 
second. The welding line should touch the first at F, the second 
at G. This is conveniently done by the use of the sine curve. 
This method is employed, I believe, at the Registrar-General's 
office. 

It cannot be said that the present theory of statistical inter- 
polation rests on an altogether satisfactory basis.* The prin- 
ciples which govern it are not well defined, and the mathematical 
analysis of the methods, by which the principles should be 
brought into relation with the facts, is incomplete. Yet it is 
perhaps unnecessary to labour after more refined methods, for 
interpolation cannot be precise unless we actually know the 
algebraic expression of the laws which govern the figures, and 
the method here discussed is found to satisfy the conditions 
empirically, while further refinements could only introduce slight 
modifications. 

* This remark does not apply to the interpolation in evaluating mathe- 
matical functions. 



250 



ELEMENTS OF STATISTICS. 



lo. Examples showing the Numerical Use of the 

Formulae. — (i.) Given the number of wage-earners earning 
sums in 5s. groups, to estimate the number earning as much as 
24s. and not so much as 25s. 





•Numbers 
per 1,000 
Wage- 
Earners 

(Adult males) 


Differences. 


1st. 


2nd. 


3'd. 


4th. 


/ a 15s. 

^S 20s. 

Earning as much S 25s. 

J 
as I OS. ^ 30s. 

I35S. 
\Jo 40s. 


39 
296 

599 
804 

918 

966 


257 

303 

205 

114 
48 


46 

-98 
-91 
-66 


-144 

7 
25 


151 
18 



( 



Neglect the increasing differences arising from the number earning 
less than 15s. 

Using formula (rj), jCo=20 (shillings), ^ = 5, j'o = 296, ^0^ = 303, 
A,2=-98,A,8=7,A,4=,8. 

At 25s., j^ = 599, from above table. 

At 24s., ^=24, ^^=296 + ^ of 303 + |"-7^ of (-98) + 

4.^'.Z^ of 7+^-::i-^.^^ of 18. 
5 10 15 5 10 15 20 

= 296 + 242.4 + 7.84 + .224 + .3168 = 547 (nearly). 

The required number is therefore 599 - 547 = 52. 

Again at 23s., ^ = ^o + 3> J = 4^9, and the number earning as much 
as 23s. and not so much as 24s. is 58. 

(2.) To make an estimate for the value of imports in the year 
18 13, the records for which were destroyed by fire. 
Given value of imports in — 



I8I0 - 


- ;^39» 202,000 - 


* * 


^1- 


I8II - 


26,510,000 - 


- 


yr 


I8I2 - 


26,163,000 - 


- 


ys- 


I8I3 - 


' " ... " * 


• 


y*- 


I8I4 - 


- 33,755»ooo - 


• 


ys- 


I8IS - 


- 32,987,000 - 


- 


y<i- 


I8I6 . 


- 27,431,000 - 


. 


yv 



* General Report on Wages. 



ALGEBRAIC TREATMENT. 



25 1 



From formulae (k), using ^g and>'5 only — 

n - 2^4 + ys = o,y^== 29»959- 
From formulae (k), using j/g and j^^ as well — 

y6 + y2- 4{y6+yB) + ^^^ = ^,j;4= 30,029. 

From formulae (k), using ^^ and j'y as well — 

^7+^1-6 (y^ + y^) + 15 (;/5+ j/g) - 20 ^'^ = o, y^ = 3o»42i. 



Here the first and second values are very near together, 
while the third differs ; hence we adopt ;£^30,ooo,ooo as the value 
required. (C/! a similar example in Boole's chapter.) 

(3.) In Mr Booth's Life and Labour of the Peopk^ e,g,^ Vol. 
v., p. 46, a series of very useful diagrams is given showing the 
age distribution of various classes. The figures he uses are as 
follows : — 



Ages. 
10-15 years 

15-20 » 
20-25 „ 

25-35 » 

35-45 ». 

45-55 .» 

55-65 ,. 

65-85 ,, 



Proportion 

occupied per 

10,000 of total 

aged 10-80. 

193-5 
880 

933 
1,636 

1,201 

830 

434 
192.5 



Average at 
each year of 
age between 
given limits. 

38.7 
176 

188.6 

163.6 

1 20. 1 

83 

43-4 
12.8 



His diagram is drawn from the last column, the numbers in 
which form the ordinates for the middle of the corresponding 
age periods. The points so obtained are joined by straight lines. 
This method is sufficiently accurate for his purpose, but it will 
afford an interesting example of interpolation if we obtain some 
of the figures for intermediate years more closely. 



Xy 



X, 



2 

X^ = 
X, 



X, 



6 



Xn 



X, 



8 





Numbers up to 


Ein Age. 


corresponding limits. 


= I2i . - . 


- yi = 1935 


- i7i - - • 


- y^ = 1073.5 


= 22^ 


- Js = 2006.5 


=30 . . , 


- yi. = 3642.5 


= 40 . . . 


. y^ = 4843.5 


« 50 . - . 


- .Te = 5673-5 


= 60 - . 


- y>j = 6107.5 


= 72J - . . 


- ys = 6300 



252 



ELEMENTS OF STATISTICS. 



Since the ^r's are not in arithmetic progression, we must 
use Lagrange's formula (6). 

To find the ordinate corresponding to the age 35, for example, 
we will include the five values of^ from ^2 ^^J^e- 

Th*n 1^71 c^ I2>»5. (-S)(-i5) ,.onrr^ i7i5»(-5)»(-i5) 

Then y-io73.5x^.5)(.,2j)(-22i)(-32i)+^°^-5^5.(-7i)(-i7i)(-27i) 

. ,.^,c_ i7ii2K-5)(-i5) . ^o,,^^ I7i{i2i) {$){'!$) 
+ 3042.5 X j^j^j ( - 10) ( - 20) + ^^^3.5 X 22i. (17J). 10. ( - 10) 

+5^73.5 X 324.27i.2a10 
=4412. 

Mr Booth's method gives 4243 for the same position. 

(4.) We can now determine the median and the mode more 
accurately than before. * We will use the figures already em- 
ployed in Chapter IV., which may be retabulated thus : — 



Earning 
more than 


Numbers. 


Differences. 


Correspond- 
ing Abscissae. 


$4.75 


9 


• 








19 


4.25 


13 










17 


3-75 


109 










15 


3-25 


363 










13 


2.75 


561 


506 


464 

327 

175 

-"55 






II 


2.25 

1-75 
125 

.75 


1,067 

2,037 

3,334 
4,806 


970 
1,297 
1,472 

317 


-137 

-152 

-1330 


-15 
-1178 


9 
7 
5 
3 


•25 


5.123 










I 



Unit of abscissa, $.25. 

To find the median use the five points whose abscissae are 11, 

7» 5» 3- 
Equation {{) gives — 

561 = ^o + a^.ii + a2.ii^ + flg-Ti* + ^4.11* 
1067 = ao + ^1-9 + %9^ + ^3.9* + «4-9*. 
2037 = tf o + av7 + ^2-1^ + «8-7® + «4-7* 
3334 = «o + ^1-5 + «2-5^ + «8-5^ + «4-5* 
4806 = flfo + ^1.3 + «2-3^ + «8-3' + «4-3* 
Using equations (X), we have — 



^4= — ^= - -^1 since ^ = - 2, AJ= - 15 ; 
* 24x2* 128 ^ 

«8=r^ + isfz^ --^V since ^0= II and AJ 
6x8 ''\6xi6 4x8/ 



I37j 



197 



ALGEBRAIC TREATMENT. 253 

464 = 8^2 + ^^ (24 X 11-6x8) — 5^(48x 112-24X 8x 11 + 14X 16) 
48 1 28 

Equations (/*) could have been used with advantage if the difference 
between successive abscissae had been unity. 
^1 is found from the equation — 

- 1472 = 2^1 + i6«2 + 9^z + 544^4 

and finally a^ = 6972^®^. 

The median is then found from the equation — 

2561J =5 <7^ -t- aiX + a^^ + a^o(? + a^cx^ 
Solving by Horner's method, we find x =» 6.142; and, therefore, the 
median is at $1,536. 

Second method : — 

Suppose X expressed as a function of ^* and apply Lagrange's 
formula (d) suitably altered. 

2)564 " 3334) (2561^ - 2037) (2561^ - 1067) (2561^ - 561 ) 
•^"3- (4806 - 3334) (4806 - 2037) (4806 - 1067) (4806 - 561) 

(2561^-4806) ( )( ) ^^^ 

(3334- 4806) ( )( . ) 

whence x = 6.237 ; that is the median is at $1.56. 

This method saves the solution of a biquadratic, and with 

small numbers would need less numerical work than the first 

method. 

Third method : — 

Use formula (?/) to obtain the necessary equation. 

Thus 256ii=;/ = 56i +•5^1111 of 506+ (iZilll^Zi) of 464 

-2 -2X -4 

^(^-ii)(^-9)(^-7)of(_ ) 

-2 X -4X -6 

+ (■y-ii)(^-9)(-y--7)(^-5) of / _ i-x 
-2X-4X-6X-8 ^ '' 

This reduces to the same equation as in the first method — 

256IJ = 6972:^8-657^*-33g*r+^-f?8 

The quartiles, deciles, and percentiles can be found by similar 
methods. 

* Compare Edgeworth*s Representation of Statistics by Mathematical 
Formula^ Statistical Journal^ 1898, p. 699. 



254 ELEMENTS OF STATISTICS. 

For the mode we must take in the last number, 5123, and 
recalculate ^ig* ^8» ^4 ^^^ *^ ^^^ highest values ol y^ and then 
solve the quadratic given in paragraph 5, viz. — 

giving the constants their new values. 

Hence :r = 8.2 014.40; -^ is positive and .*. Z a maximum, 

when ;c = 4.40. The mode is then at $1.10. 

The mode can of course be determined less accurately by 
taking 4 or 3 given points instead of 5, or for greater accuracy 
more can sometimes be used. 

Another mode may be found between $2.75 and $3.75 from 
the five highest abscissae. This proves to be at $3.20. 

This method is applicable to such problems as the determina- 
tion of the date at which the population, the marriage, birth, and 
death rates, &c., increased most rapidly ; at what age the chance 
of death increases most, &c.* 

B. The second division of the problem of interpolation is 
when the original returns have to be corrected, e,g.y the deter- 
mination of the distribution by age from the census returns. 

We have now the problem of drawing a smooth line in the 
neighbourhood of a great number of points, but not necessarily 
through any of them. The assumption is that the returns are 
insufficient in number or deficient in accuracy, and that they 
indicate a regular distribution which it is required to represent. 

1. One method is to assume that the averages over fairly 
large groups are accurate, and to these averages to apply any of 
the methods discussed under group A. 

2. A second method has been used in the section in which 
various curves were smoothed {vide supra^ Chapter VII.). This 
may be restated as follows : — Take successive groups of 2, or 3, 
or 4 .... 10 points, beginning again and again at the ordinates 
for each of the given abscissae. Find the centres of gravity of 
each group ; that is, erect an ordinate equal to the average 
of the ordinates of a group at the point half-way between the 
ends of the abscissae of the outside ordinates of the group. 
Draw a line through the points so obtained. It will be found 
that this line satisfies all the conditions laid down. An 
example of this method is given in the diagram facing p. 151. 

* Cf, Edge worth, in StaHsHcal Joumaly 1895, p. 381, and the references 
there given. 



ALGEBRAIC TREATMENT. 



255 



3. In another method* the original figures are smoothed till 
the differences of the fourth or fifth or higher orders vanish; and 
then the ordinary formulae of interpolation are applied. 

Thus in example i, on page 250, rewrite the table thus : — 



Wages. 


Smoothed 
Numbers. 


Corrected Differences. 


Up to 20s. 

» 253. 
n 30s. 

» 35s. 
„ 40s. 


296 

599 +« 
804 4- d? + ^ 

918 

966 


1st. 
303 +df 
205+^ 
114-a-d 

48 


2nd. 

-gS-a + d 
-gi -a- 2b 


3rd. 

7-3^ 
25 + 2dr-l-3^ 



If we put ^= 2^, a= - 16, the third differences vanish, and we have 
AJ=287, AJ=-79f, Al = A* = o; when ^=25, >' = 583, and when 

x^24, y=2g6-\-^ of 287 - -^^ of ( - 79I) = 532 ; 
so that the number earning as much as 24s. and not so much as 
25s. is now found to be 51, instead of 52. 

The corrections may be applied to any of the original figures. 

We need to solve only one more equation to complete our table 
from 20s. to 30s. 

When X — 2^,y= 296 + f of 287 + -^of 79I. The difference be- 
tween this and the value ofj^, when ^= 24, is ^ of 287 - ^t^^ 79l = 54.2. 

We have therefore the following table, where the figures in 
italics have already been calculated, while the others are added 
on the assumption that the third differences are zero. 



Wages. 


Numbers. 


Differences. 


Up to 20s. 

„ 2IS. 
n 22s. 
>* 23s. 
„ 24s. 
» 2CS. 
„ 20s. 
„ 27s. 
„ 28s. 
„ 29s. 
» 30s. 


2g6 
360 
420 
478 
532 
S^S 
631 
677 
719 

757 
792 


zst. 

63.8 
60.6 

57.4 

S^^ 
47.8 
44.6 
41.4 
38.2 

35 


2nd. 

• • • 

3.2 
3.2 
3.2 
3-2 
3.2 
3.2 
3.2 
3.2 

3-2 

• • • 


3rd. 

... 










... 

1 



If we had taken the second differences more exactly, we 



* Suggested to me by Mr W. F. Sheppard. 



2S6 ELEMENTS OF STATISTICS. 

should have obtained 804 + a + d =s 790J for the last figure 
as in the previous table. 

This method of writing down many figures when the signifi- 
cant differences have been found can be very generally applied 
in Group A as well as here. 

4. Another method, involving higher mathematics, would be 
discussed more suitably after the section devoted to the law of 
error ; a brief explanation with a useful formula may, however, 
be offered here. 

Suppose we have five consecutive points ( — 2,^2)* (""Ij^'i)* 
(^,^), 0».;'i)»(2,J^9)given. 

* A parabola of the fourth order could be drawn through these 
five points, but would have two points of inflexion. A great 
number of parabolas of the third order can be drawn near all 
the points, having no points of inflexion, and satisfying all the 
ordinary conditions of interpolation. 

Borrowing a principle from the method of least squares,* if 
the coefficients of the parabola ^= a: +d;r+cr^+^ji^ are chosen 
so as to make the quantity 

(where the summation extends over the five pairs of values of 
X and J/) a minimum, the parabola so determined will be the 
best for the purpose. 

For the necessary mathematical analysis, Professor Darwin's 
paper On Fallible MeasureSy\ from which this method is taken, 
should be consulted. 

The following equation is obtained — 
a^y-- ^ X AJ, where AJ is the difference of the fourth order for 
the ys. 

Now replace the point ((?, y) by the intersection of its 
ordinate with the parabola, that is by (^, a\ where a has the 
value just given, that is, diminish^ by the quantity /^-^i 

Repeat the same process for each point on the original line, 
taking it as the middle of a group of 5, and a smooth curve 
lying very near all the original points is obtained. 



♦ See Merriman's Method of Least Squares^ Chap. Ill, 
t See Phil. Mag, and Journal^ July 1877. 



ALGEBRAIC TREATMENT. 



257 



Thus we may smooth line C in diagram facing p. 164. 



Imported 

Wheat per head 

of the 

Population. 



1890 
1891 
1892 

1893 
1894 

1895 
1896 

1897 

1898 



lbs. 

226 

244 

245 
248 

256 

285 

257 
228 

238 





Differences. 




18 


- 17 






I 


2 


19 


-16 


3 

8 


5 


3 
16 


13 


29 


21 


-78 


-94 


-28 


-57 
- I 


56 


134 
-16 


-29 
+ 10 


39 


40 

• 





Smoothed Figures. 



245+ A of 16- 
248 -A of 13 = 



256+ A of 94= 
285 -A of 134= 
257+ A of 16= 



246^ 

247 
264 

263J 
258* 



5. In many series of observations it is found that the num- 
bers very nearly satisfy some algebraic formulae,* such as the 
binomial expansion, the geometric progression, the law of error, 
or some specially chosen expression. In such a case the con- 
stants of the equation chosen are computed by methods similar 
to that of the last paragraph, and the original observations are 
replaced by the ordinates of the curve thus determined. Prof. 
Pareto has found an equation which fits the data of the distribu- 
tion of incomes.t Modern mathematical statistics deals very 
frequently in such formulae. Here we will briefly describe one 
which has very practical utility, namely, Makeham's formula 
for the life table. J If 4 is the number who survive to the age x 

out of a given generation, then the formula l^^ks^ {gf , where 
ky s, gy c are selected constants, fits the records from the ages of 
20 upwards with such exactness, that the formula is used for 
practical actuarial calculations. The formula is not quite arbi- 
trary, but can be obtained from the hypotheses described in the 
following paragraph. 

Let the quantity I ''*'% ' = — j^ = F^dx, Then /u,! is called the 

" force of mortality," and represents the ratio of the number of persons 
dying in a short interval to the total number alive at the beginning 
of the interval. 



♦ See Edgeworth, i^d,, p. 671. 
t See Statistical Journal^ 1896, p. 533. 
X See Institute of Actuaries Text-Book, Part II. 

R 



258 ELEMENTS OF STATISTICS. 

This force is supposed to consist of two parts, one constant and = A ; 
and the other such that the ratio of the increase of the force to the force 
is constant, that is, that the force continually increases in a geometric 
progression. For the latter part (fi\) 

logft'^ = D^+E 

ft',, = e^+^ - B ^, where B = <?=, and log ^c = D. 

Then /li, = A + B^. 

This equation represents the hypothesis that the chance of death 
consists of one part which is constant for all ages, and another which 
is due to the power of resisting death diminishing continuously with 
age in a constant ratio. 

dL 
We have - 7^ = IJ^ « (A + ^c^)dx 

- log /, = A^ + k-^c^ + k^ where k^t k^ are constants. 

4 = ^.^. {if t where - A = log e^ 

- ^1 = log e^ 

- *2 = log e^ 

For further information on the subject of interpolation, the reader is referred 
to Dr Farr's U/e Table (No. 3), 1864, Boole's Finite Differences^ Text-Book of 
Institute of Actuaries^ Part II., p. 420 seq.. Rice's Theory and Practice of Inter- 
polation^ 1899, Merrifield On Quadratures and Interpolation {^n\\^ Associa- 
tion Report, 1880), Chauvenet's Spherical and Practical Astronomy {ChsLg, II.X 
Woolhouse in the Assurance Magazine (Vols. XL, XII.), Professor J. D. 
Everett's Papers (published or forthcoming) On the Algebra of Difference 
Tables (Quarterly Journal of Mathematics, No. 124, 1900), On a Central- 
difference Interpolation Formula (British Association Report, 1900), and in the 
Journal of the Institute of Actuaries, January 1901, and Mr W. F. Sheppard's 
Papers On Central Difference Formulce (Proceedings of the London Mathe- 
matical Society, Vol. XXXI., Nos. 707-710), and On the Use of Auxiliary 
Curves in Statistics of Continuous Variation (Statistical Journal, September 
1900). In these other references will be found. Part of the foregoing 
chapter might be simplified by the use of " central differences," but in so 
short an introduction to the subject it seemed best to keep to the more 
familiar method. 



PART II. 



APPLICATION OF THE THEORY OF 
PROBABILITY TO STATISTICS. 



PART 11. 

APPLICATION OF THE THEORY OF PROBABILITY 

TO STATISTICS. 



SECTION L 

Introductory. 

The arguments on which the theory of algebraic probability 

depends are not difficult to follow, and are in fact grounded 

Object on every-day experience ; the development of 

of Part u. calculations also is often little more than straight- 
forward arithmetic ; and without using any elaborate mathe- 
matical theories we can examine the nature and deduce the 
equation of the curve of error, which, though it is the foundation 
of modern mathematical statistics, is only a reasonable summary 
of common experience. 

It is not proposed here to go beyond the more elementary 
and common applications of the law of error ; the more 
advanced treatment tends to deal more with theory and less 
with practical applications, and is most suitably studied in 
the original treatises scattered through the journals of various 
learned societies. The present object is to endeavour to make 
clear the groundwork of the subject, so that it will be the 
easier for students to follow modern writers on statistics; the 
mathematicians who are opening up new ground in this direc- 
tion naturally cannot stop in each article they write to establish 
the elementary theorems which are already common property, 
and so it is often not easy for readers, unfamiliar with these 
elements, to find any satisfactory discussion or proof of the 
preliminary formulae or theorems, since they are not contained 
in any text-book devoted to the subject. It is this lack of 



262 ELEMENTS OF JSTATISTICS. 

a 4>reliminary text -book that it is wished to supply in this 
and the following sections. 

The treatment is not intended to be original, and is, it is 
hoped, not inconsistent with Professor Edgeworth's published 
treatises,* since the greater part of the mathematics employed 
is gleaned from his essays, and the earlier authorities to whom 
he makes reference. The exact form of the proofs employed, 
and the particular ways in which the formulae are used, are not 
in all cases to be found elsewhere ; and any fault which may be 
found with the arguments or application of formulae must attach 
to the present writer. To avoid mere repetition of what is better 
said elsewhere, and not to cumber the ground with well-known 
elementary formulae, the reader is assumed to be acquainted 
with Dr Venn's Logic of Chance^ and Whitworth's Choice and 
ChancCy or with the chapter on Probability to be found in 
ordinary school algebras. It is hoped that the following pages, 
however, will be for the most part independent of proofs or 
formulae of which the explanation is not furnished. 

To the statistician of a generation ago, to the so-called 

practical men of the present day, and perhaps to some political 

Need for economists, it would seem absurd and unnecessary 

appiioation of to apply these tedious arguments and complicated 
eory fQj-j^^jgg ^^ ^^ study of mere figures, which at 
first sight appear subject to the ordinary rules of arithmetic ; 
but it will be found as we proceed that we are able by their use 
to solve problems and investigate causal relations which, though 
apparently simple, must entirely baffle direct attempts to obtain 
arises ftom ^" ^^^^ solution. The necessity of some application 

tbe definition of the rules of probability becomes evident from 
"* the very definition of the science of statistics.f 
Statistics deals with great numbers, the numbers of the items 
which compose some part of the economic or social body as 
a whole. It does not deal with a single homogeneous mass 
but with a complex body composed of multitudinous units 
differing in form and action one from the other; and it is 
with the complex not with the units that it is concerned. 
Just as in the mechanics of rigid bodies it is necessary to 

* See IdU and Phil, Magazine^ passim; Statistical Journal^ passim; 
Report of British Association on Methods of Ascertaining Variations in the 
Monetary Standard^ 1888, and others. 

t See supra^ p. 7. 



IN TRODUCTOR Y. 263 

make some hypothesis as to the laws which hold their con- 
stituent molecules in place, before any general problems relating 
to their motion as a whole can be attacked, and in the kinetic 
theory of gases a generalized theorem of the motion of the 
separate molecules is employed, so in statistics we must obtain 
some generalizing principle as to the relation of unit to unit 
before we can study the phenomena manifested by the body. 
The economist and the politician, when investigating the 
effect of a given force, are as a rule concerned with its effect 
on the whole mass, not on the individuals in particular.* 
For illustration, we may take one of the numerical totals, relating 
to a nation, that remains nearly stationary year by year ; say 
the number of marriages yearly in a population of ten millions. 
It is on the toted that we trace the effects of a change in 
our marriage laws. If we regarded only a single family, or a 
village or small town, we should not find any constancy ; the 
marriage rate would be changing continually with the personnel 
and age of the small community, and we could not trace with 
certainty the effect of any external cause. But add family to 
family, village to village, and district to district ; the individual 
peculiarities of the parts are rapidly lost in the total ; in a 
large community the same number are of marriageable age 
year by year, the same distribution by age and sex recurs 
continuously ; if undisturbed by external influences, the same 
marriage rate will be found over a long period. Each couple 
is influenced by many circumstances before finally deciding to 
marry; there are very many causes, each of limited effect, 
which influence the question in different localities, such as an 
exodus of young men from one district, commercial depres- 
sion in another, a new demand for labour in a third ; but 
when many districts are taken together these small disturbances 
counterbalance one another. To produce a change in the 
rate, the action of a cause is necessary which affects many 
districts in the same way. Here is to be found the assumption 
that underlies all statistical investigation, viz., that many inde- 
pendent disturbing causes of small individual effect neutralise 
one another in the mass, f 

It is a matter of common experience that great numbers 

♦ See Miirs Logic^ Book III., Chap. 23, and Book VI., Chap. 3. 
t Compare t title of Lexis' treatise, viz., ^ur Theorie der Massen- 
erscheinunget" in der men^rhUcher Gesellschaft, 



264 feLEMENtS or StATlSTlCS. 

and averages drawn from them are nearly stationary. By 

neoommonpM- searching for the common properties contained in 

partjofgrMi these numbers, we shall find the clue to this con- 

"'"' stancy. The following are among the numbers 

which do not undergo rapid change : birth, marriage, and death 
rates in districts of, say over a million inhabitants ; death rates 
according to age or disease over larger areas; the numbers 
of the inhabitants of a great kingdom, even when subdivided 
by age and sex, the numbers of paupers, criminals, lunatics, 
afflicted ; the consumption of certain commodities, the total 
income, the average wage, and total imports and exports 
(though here the constancy is not so apparent). These are 
all totals of many small items, the existence of each of which is 
determined independently and apparently by chance. Another 
class is to be found in meteorological measurements, such as 
annual rainfall, mean temperature, and mean barometric height, 
where the average or total is again drawn from the combination 
of many small independent variations or contributions. An allied 
class is found in such physical measurements as average height 
and weight. 

It is not so easy to exemplify large numbers which are not 
constant The total revenue which varies with each change 
in impost is an example. The number of a conscript army 
changes with the law controlling it, the number of volunteers 
with improved conditions of service, the area of the British 
Empire with each territorial extension, the volume of trade 
with a commercial inflation, the death rate with an epidemic. 
All these are changes where one cause has influenced many 
items at once in the same direction ; but even here the 
underlying constancy arising from the multitudinous small 
independent causes is apparent 

This constancy, marvellous as it actually is, is generally 
accepted as a matter of course; and it is not the regularity 
vaiiauon of great but the occasional deflections which are the sub- 

nnmberB. jg^t of comment For instance, the death rate in 
London will hardly change, except regularly with the seasons, 
week by week through a series of years ; and when an increase 
of 5 per 1,000 occurs in some week, the newspapers write of 
an influenza epidemic. The mean annual rainfall will for a long 
period be near its average ; then a decrease of 5 inches excites 
remarks on a permanent change of climate. It is because this 



INTRODUCTORY. 265 

regularity has become a matter of common experience that so 
little attention is generally given to it. A cursory inspection, 
however, of the records for a period of weeks or years of any of 
these numbers will show that the constancy is not absolute ; 
that each rate varies through a great or small percentage, and, 
except that the variation seldom passes certain limits, without 
any apparent law. Thence at once rises the question, how are 
we to determine whether a given deviation is due to some 
general cause, such as an epidemic, a change of climate, or a 
new law, or is natural to the phenomena ? 

This question can only be answered by an appeal to the 

laws of probability. To take a numerical instance : suppose we 

niurtrationby ^^^ dealing with 1,000 men, each fifty years old, 

the wnomiai how many should we expect to die in the year? 

ezpan on. p^jj ^^^^ ^^ former experience, and find what 

has been the average death rate under similar circumstances ; 
this rate gives the number to be expected d priori^ a great 
divergence appears from past experience to be improbable, 
and the greater the divergence the greater the improbability; 
an exact repetition of the average itself appears to be im- 
probable ; the question is, what divergence is to be expected ? 
This is insoluble directly, but we can frame a hypothesis which 
throws light on the problem. Suppose the ascertained death 
rate to be 50 (per 1,000), and further suppose that the chance of 

death for each individual is -5^=—. Then it is easily deter- 

1000 20 ^ 

mined by the rules of algebraic probability that the successive 
terms in the expansion by the binomial theorem of f — +^) 

represent respectively the chances that exactly o, i, 2, 3 . . . 

(\ 1000 
15 J is the chance that none die, 

(I9\ 999 / I \ 
20 ) ( 20 ) ^^ ^^^ chance that one assigned individual only dies, 

1000 X (j^j r^j that only one unassigned individual dies, 

and so on. The death of exactly 50 is more probable than any 
other number, 49 very nearly as probable, 5 1 next. It is very 
soon apparent when the successive terms are calculated, that 
any great divergence from 50 is very improbable. 

This conception, that all the men start with the same chance 
of death, or, in a more developed form, that their chances of 



266 ELEMENTS OF STATISTICS. 

death are grouped about an average ;^, satisfies the A priori 

Proo0i8jii8tiflad conditions of the problem, and clearly leads to 
i^ittrMiatt. results which correspond roughly at any rate 
with experience; but the justice of the conception cannot be 
deduced d priori^ for it is universally the case with any 
hypothesis as to probability, that conformity with experience 
is the only justification for the hypothesis. If it is true, we 
should find that when the records of many such generations of 
I, GOO men were examined, the divergences from the average 
were grouped in the way shown by the algebraic calculation. 
The records for this particular examination are not extant ; 
but in the sequel some records will be given where experience 
marches with theory, and references will be given to books where 
others may be found ; though it may be said at once that the 
agreement is not perfect, and that there are indications that the 
law is not so simple as that already suggested. 

Consider the supposition that the chance of a death within a 

year is — . When we say that the chance of an event is ~, we 

TheaiMBtngof ^^^^^ ^^at if the circumstances connected with it 
A numerical recurred again and again, the event would occur on 
an average once for each twenty such recurrences.* 
Thus if a die with six regular faces is thrown again and again, 
the different faces tend to come uppermost with equal frequency. 
As a matter of fact, each of the six would probably not be found 
once in each six throws, nor exactly two of each in twelve throws ; 
but, in the long run, it is a matter of experience that the numbers 
of times each of the six faces come uppermost tend to be equal. 
Suppose an experiment, for the success of which the chance is 

^ to be performed again and again. In 200 attempts from 8 

to 12 successes may be obtained; in 2,000 the proportion of 

successes to attempts will probably be nearer — , say 94 to 106 

successes; in 2,000,000 yet nearer. Now suppose 1,000 experi- 
ments to be made ; as we have seen, exactly 50 successes are 
not to be expected: but let 1,000 after 1,000 be tried; some- 
times more, sometimes less than 50 successes will be obtained ; 
and as the series continues the general average will tend nearer 
and nearer to 50. 



Logic of Chance^ 3rd Edition, pp. 4, 5. 



INTRODUCTORY. 267 

Still postponing the examination of the exact grouping of 

such numbers about their average, let us examine further the 

Reution to law nature of the argument. Suppose we are given a 

of error. series of large numbers or rates, measuring similar 
quantities year after year ^ we shall find, when they are grouped 
according to their distance from their average, that the fur- 
ther from the average the fewer are the instances. In most 
cases we cannot work backwards to a number of individuals, 
each of whom has an equal chance of furnishing an event, but 
we can examine this grouping, notice how far the numbers are 
from their average, and so on ; in many cases we shall find that 
these divergences conform to a definite law, the law of error, 
which is obeyed by all great numbers coming from series of 
experiments as just described. The point to notice specially 
here is, that correspondence to this regular law of divergence is 
natural, and it is for discrepancies that we need seek a reason. 
It is improbable, it is impossible, that great numbers should 
remain absolutely constant ; from the nature of the case there 
must be variation ; in very many cases the natural variation, the 
variation to be expected i prioriy is that in accordance with the 
law of error. This is so with those great numbers which are the 
sum of very many items, in favour of the existence of each of 
which there is a definite chance, or, more generally, the existence 
of each of which may be influenced by many independent causes 
each of limited effect. 

A slight confusion may arise from the use of the words 

cause and chance in this statement; this can be removed by 

Causa and eliminating the word chance. We say a thing 

chance. happens by chance, when its occurrence is influ- 
enced by many independent causes whose separate effects 
we cannot trace, as when we draw a card from a thoroughly 
shuffled pack. Now if we consider a man's death from 
the point of view of an insurance office, we regard the 
man as of normal health and constitution, and liable to all 
the latent diseases, the accidents, and the epidemics, from 
which experience shows men suffer ; we cannot trace the inci- 
pient development of a disease, nor foretell the chain of events 
which lead to an accident. We then speak simply of the 
chance of death within a certain period, and say experience 
shows it to be {e.g^ ~, and, regarding the peculiarities of a 
particular man as unknown, we say that his chance of death is 



268 ELEMENTS OF STATISTICS. 

-i. Generalizing, any group of men, each of the given age and 
in the given circumstances, is composed of individuals for each 

of whom the chance of death is — . Now, go behind the idea of 

20 

chance to that of cause. Each death is the result of some 
particular event, or, to speak more correctly, is due to the action 
of a complex of many causes ; all these untraceable causes pro- 
duce on an average one death among 20 living ; the statement 
of the numerical chance is merely the summary of these effects. 
To say, then, that the number of deaths to be expected among 
1,000 is the same as the number of successes to be expected in 

1,000 attempts, the chance of success in each of which is -, is 

not inconsistent with saying that the number of deaths is deter- 
mined by the action of a multitude of causes none of which by 
itself produces a great effect In either case the laws of great 
numbers will be found to apply. The use of the intermediate 
numerical chance only facilitates calculation. 

Now suppose that a new cause is suddenly introduced, or the 
action of one of the causes is intensified (say, by an epidemic), 

Effeot of A ^"^ ^^ ^"^^ *^^ whole scheme of calculation is 

predominant thrown out, and we get a result which does not 

**'***• correspond to the probability calculation ; it is this 

non-correspondence which indicates the existence of a disturbing 

cause. 

Since the distribution in accordance with the curve of error is 
the result which may be expected ct priori^ whenever we are deal- 
ing with numbers generated in this way, it is clearly necessary to 
study this distribution before we can base any arguments on the 
variation of great numbers. When we have established the result 
which the independent action of a very great number of individually 
unimportant causes can produce, then, and not till then, we are in 
a position to consider the effect of a predominant cause. We 
may even be able to deduce the existence of such a cause, for if 
we find by examination that a divergence of more than 3 per 
cent, from an average is improbable, and in a particular case we 
have a divergence of 30 per cent., we are either in the presence 
of a very improbable event, or some external predominant cause 
has influenced our numbers. 



EQUATION OF THE CURVE OF ERROR. 269 



Section II.— The Equation of the Curve of Error. 

In this section it is proposed to develop the algebraic equa- 
tion and properties of the curve of error, bringing them into 

Th« Motion oan elation with the other sections. It will be not 
De omitted, impossible for non-mathematical readers to follow 
the great part of the argument of this branch of statistics with- 
out working through the mathematical proofs of the formulae ; 
and the book is so arranged that this section can be omitted. 
Other readers may turn this chapter through looking at the 

but not without large type only, and notice the main lines of the 
loflt. argument. For any thorough student of statistics, 

however, the mathematical proofs, which are so simplified in this 
chapter as not to involve the integral calculus at all, and the 
differential calculus only for two small points, are essential. In 
this section an acquaintance with algebra up to and including 
the exponential and logarithmic series is assumed. Starting 
from that point, the main formulae relating to the curve of error 
are deduced. 

Elementary Theorems in Probability. 



Definition. — If an event can happen in m ways, and fail \n n-m 



m 



ways, and all these ways are equally likely to occur, then — is the proba- 
bility of its occurrence. 

Let — = /, and =» q\ then/ + ^ »= i. ^ is the chance that 

ft ft 

event will not occur. The odds in favour are p to ^, those against are 

f to/. 

£.g.f the chance that a card, drawn at random from a full pack, is a 

spade « 12 =: i. 
52 4 

Theorem. — If /1/2 ^^^ ^^® chances of two independent events, then 

P\ ^p2 ^s the chance that both will occur. 

Suppose that/i = — ^/g - —2 

The first event may be expected mi times in n^ trials, or m^n^ times in 



270 ELEMENTS OF STATISTICS. 

The second event may be expected m^ times in «2 trials, or m^m^ 
times in m^n2 trials. 

Hence the second event will occur at same time as the first ^^Wg times 

in «i«2 trials ; that is, the chance of the double event is —i— 2- >s p^p^' 

Examples. — Independent events, — The chance that two sixes will be 

thrown with a pair of dice = ~ x - = — -. 

6 6 36 

Dependent events, — The chance that three cards taken in succession 
from the same' pack shall prove to be ace, king, and queen of the same suit 

in any order is — x — x — = : for the chance that the first card 

52 51 50 5525 

drawn is an ace, king, or queen is — ; supposing it to be queen, 

■ 2 
the chance that ace or king of same suit follows is — : and the chance 

that the third draw gives the remaining card is — 

The chance that 13 cards taken at random from a complete pack will 
contain 8 spades and 5 clubs is — ^^^ ^\* for 8 spades can be chosen 

in ^^Cg ways, 5 clubs in ^^Cg ways, and the hand may contain any such 
group of spades with any such group of clubs ; hence the numerator 
given corresponds to m of the definition given above; also there are 
*2Ci8 equally likely possible hands of 13 cards, so that the denominator 
given corresponds to «. 

Theorem, — If n coins are placed at random on a table, the chance 
that r will show heads and the rest tails is — -'. 



2» 



For suppose there are n places to be filled each with a coin : — 
The first may show head or tail, two ways. 
The second may show head or tail, two ways. 
The first two places may therefore be filled in 2 x 2 ways. 
The n places may similarly be filled in 2" ways. 
Now r of these places can be chosen in "C, ways ; and to each such 
selection corresponds one arrangement in which these r places are filled 



n ! 

* "C, or ^C, is written for zz , the numbers of combinations of n 

n- r\r\ 

n I 
things r at a time ; and »P, or »P, is written for z=i=zr , the number of Per- 

n — r\ 

mutations of n things r at time. 



EQUATION OF THE CURVE OF ERROR. 



271 



with heads and the rest with tails (and many other arrangements not 
giving this result). 

Hence out of 2" possible arrangements, "Cr give the result. 

Aliter, — Consider the product (^^ + /j) (^2+ ^2) • • • (^ + O* 
Any term of this product, e.g., ^ ^2 ^8 ^4 ^6 • • • ^n-i ^n corresponds 
to one arrangement of the n coins. 

The number of arrangements containing r heads and n-r tails is 
the same as the number of terms containing r k^s and n-r /'s. 

This number is the same as the coefficient of A^ /""' is the expansion 
by the binomial theorem of (h + /)", which is obtained from the product 
above by writing k for k^^ h^ &c., and / for /j, /g* &c. 

(A + /)» = A" + °Ci. >^"-' /+...+ "Q //"-' /••+...+ /». 

Hence the number of arrangements producing the required result 
is "Q. The total number of possible arrangements is the sum of the co- 
efficients in this expansion ; this is found by putting A = /= i. 

(i + i)"=i+"Ci +...+ "Q +...+ I. 
Hence total number = 2°. 

Notice that "Cr = "Cn-r 

Example. — The coefficients in the expansion of (i + 1)*^ are as 

follows : — 



32P — 82p -3 

82p _ 32p — 
^1 ~ ^81 

32p — 82p =3 

^2 "" ^80 

S2p _ 82p _ 

»2C, = »2c^ = 

"i'C^ = "^C^r = 

82p — 82p — 

^6 ~ ^26 

MC, = ^^ = 

»2C8 = »2C24 = 

»^C» = i^Qj = 

»^C,„ = ^^^ = 

'^Cii = "^C^i = 

»^Ci2 = "^Qo = 

"^Cis = "^Ci, = 

82p _ 82p _ 



'14 



18 



82C,, = 32C,, = 



'16 



82C 



17 



16 



32- 
496. 

4,960. 

35,960. 

201,376. 

906,192. 

3»365»856. 

10,518,300. 

28,048,800. 

64,512,240. 
129,024,480. 
225,792,840. 
347»373»6oo. 
47i,435»6oo. 
565,722,720. 
601,080,390. 

»82 _ 



I. Corresponding chance * .0000000002 

.0000000074 
.0000001155 
.000001155 
.000008375 
.00004688 
.0002110 
.0007837 
.002449 • 
.006530 . 
.01502 . . 
.03004 . . 
•05257 . . 
.08088 . . 
.1097... 

.I3I7--. 
.1400 . . . 



2»2 = 4,294,967,296. 



The table just given shows that when 32 coins are placed on a 
table at random, the chance that 16 heads and 16 tails shall appear 



* Obtained by dividing each term by 2*", 



272 ELEMENTS OF STATISTICS. 

is .14, while it is more likely that either 15 heads (and 17 tails) or 
15 tails (and 17 heads) will be found, the united chances for these 
being .2634. The chance that the divergence from equal division shall 
not exceed 2 ({>., that there shall be at least 14 of each) is .1400 + 2 x 
(.1317 + .1097) a. 6228; the chance that there shall be as many as 27 
of one kind is only .0001 1, ue.^ i in 9,000. 

The Binomial Expansion. 

The following table from Quetelet's Lettres sur la Theorie des 
ProbabilitiSy p. 375, shows a similar calculation when the index 
is 999 instead of 32. For instance, ^^^C^qq (J)®^=. 025225, the 
first quantity in Column 3. 

As the index of the binomial expansion is continually 
increased, the grouping of the figures takes a definite shape. 
The curve so obtained when the index is indefinitely great is 
called the curve of error. 

In the diagram at the end of the book, the line Ag F^ Fg 
represents the first half of the coefficients of {a-\'Vf\ the line 
Ag Gi G2 G3 G4 Gg represents the coefficients of (a+^)^^; the 
line Ag Hi Hg H3 . . . H^ represents the coefficients of («+^)*^ 
and Aq Ai A2 A3 A^ is the curve of error. To fit these jagged 
lines to the curve of error, the maximum coefficient is repre- 
sented in each case by the line O Ag, and ordinates are drawn 
at equal intervals proportional to the other coefficients ; the 
tops of these ordinates are then joined by straight lines. The 
interval between successive ordinates is decided by the con- 
sideration that the area included between any chosen ordi- 
nate, say Hg Pj, the base P^ O, the maximum ordinate O Ag, 
and the line Hg H^ A2 shall be the same fraction of the whole 
area Ag O . . . H9 . . . Ag, as the part of the area of the 
limiting curve of error cut off by the same ordinate is of the 
whole area bounded by O Ag, O X and the curve. 

The algebraic determination of this limit is given on pp. 
275 seq. 

Suppose now that one ball is taken out of each of n bags, each con- 
taining m^ white and m^ black balls, the chance that r will be white 
and « - r black is — 

•C,/'. f'\ where/=:^ and ^=^and m^m^^-m^ 

tn tn 

For r bags may be selected in "C, ways. The chance that each of 



EQUATION OF THE CURVE OF ERROR. 



2;3 



Scale of Precision. 

999 balls are drawn from a bag containing equally great numbers 

of black and white balls. 

Column I gives number of each colour. 

» 2 gives rank of deviation from equality. 

» 3 gives probability that balls will be drawn in proportion given in 

Column 2. 
«r 4 gives probability that deviation from equality will not be greater 
than that of given rank. 



1 

1 


[. 


2. 


3- 

Scale of 
Probability. 


4* 

Scale of 
Precision. 




I. 


2. 


3- 

Scale of 
Probability. 


Scateof 
Precision. 


Groups of 


• 


Probability 
that such ' 


Sum of 
)robabilities 


Groups of 


• 


Probability 
that such 


Sum of 
probabilities 








a group will 
be drawn. 


starting 
from most 
probable. 








• •*••» »J lr4 ^» m m 

a group will 
be drawn. 


starting 
from most 
probable. 


White. 


Black. 






White. 


Black. 






499 


500 


I 


.025225 


.025225 


459 


540 


41 


.0009458 


.495278 


498 


501 


2 


.025124 


.050349 


458 


541 


42 


.0008024 


.496081 


497 


502 


3 


.024924 


.075273 


457 


542 


43 


.0006781 


.496759 


496 


503 


4 


.024627 


.099900 


456 


543 


44 


.0005707 


.497329 


495 


504 


5 


.024236 


.124136 


455 


544 


45 


.0004784 


.497808 


494 


505 


6 


.023756 


. 147892 


454 


545 


46 


.0003994 


.498207 


493 


506 


7 


/O23193 


.171085 


453 


546 


47 


.0003321 


.498539 


492 


507 


8 


.022552 


.193637 


452 


54E 


48 


.0002750 


.498814 


491 


508 


9 


.021842 


.215479 


451 


548 


49 


.0002268 


.499041 


490 


509 


10 


.021069 


.236548 


450 


549 


50 


.0001863 


.499227 


489 


510 


II 


.020243 


.256791 


449 


550 


51 


.0001525 


.499380 


488 


5" 


12 


.019372 


.276163 


448 


551 


52 


.0001242 


.499504 


487 


512 


13 


.018464 


.294627 


447 


552 


53 


.0001008 


.499605 


486 


513 


14. 


.017528 


.312155 


446 


553 


54 


.COO0815 


.499686 


485 


514 


15 


.016573 


.338728 


445 


554 


55 


.0000656 


•499752 


484 


515 


16 


.015608 


.344335 


444 


555 


56 


.0000526 


.499804 


483 


516 


17 


.014640 


.358975 


443 


556 


57 


.0000421 


.499847 


482 


517 


18 


.013677 


.372652 


442 


557 


58 


.0000334 


.499880 


481 


518 


19 


.012726 


.385378 


441 


558 


59 


.0000265 


.499906 


480 


519 


20 


.011794 


.397172 


440 


559 


60 


.0000209 


.499927 


479 


520 


21 


.010887 


.408060 


439 


560 


61 


.0000164 


.499944 


478 


521 


22 


.010008 


.418070 


438 


561 


62 


.0000128 


.499957 


477 


522 


23 


.009166 


.427236 


437 


562 


63 


.0000100 


.499967 


476 


523 


24 


.008360 


.435595 


436 


563 


64 


.0000077 


.499974 


475 


524 


25 


.007594 


.443189 


435 


564 


65 


.0000060 


.499980 


474 


525 


26 


.006871 


.450060 


434 


565 


66 


.0000046 


.499985 


473 


526 


27 


.006191 


.456251 


433 


566 


67 


.0000035 


.499988 


472 


527 


28 


.005557 


.461809 


432 


567 


68 


.0000027 


.4999912 


471 


528 


29 


.004968 


.466776 


431 


568 


69 


.0000021 


.4999933 


470 


529 


30 


.004423 


.471199 


430 


569 


70 


.00000x6 


.4999948 


469 


530 


31 


.003922 


.475122 


429 


570 


71 


.0000012 


.4999960 


468 


531 


32 


.003464 


.478586 


428 


571 


72 


.0000009 


.4999969 


467 


532 


33 


.003047 


.481633 


427 


572 


73 


.0000007 


.4999976 


466 


533 


34 


.002670 


.484304 


426 


573 


74 


.0000005 


.4999981 


465 


534 


35 


.002330 


.486634 


425 


574 


75 


.0000004 


.4999984 


464 


535 


36 


.002025 


.488659 


424 


575 


76 


.0000003 


.4999987 


463 


536 


37 


.001753 


.490412 


423 


576 


77 


.0000002 


.4999989 


462 


537 


38 


.001512 


.491924 


422 


577 


78 


.00000014 


.4999990 


461 


538 


39 


.001298 


.493222 


421 


578 


79 


.00000011 


.4999991 


460 


539 


40 


.001110 


.494332 


420 


579 


80 


.00000004 


.4999992 



By means of this scale the binomial 999, practically equivalent to curve of error, 
can be fitted to and compared with any series of observations. 

S 



^74 tLEMENtS Ot^ StATlSTlCS. 

these will yield a white ball is pxpxpx ...tor factors, i>., p' ; the 
chance that each of the other «-r bags will yield a black ball is f""' ; 
hence required chance is as stated. 

A/iter, 

Call the white balls in the first bag iW^, ^^ . . . ja/n,, ; 
,, black „ ,, i^j, i^2> • • • i^m, \ 

„ white balls in the second bag gO/^, gWg, . . . gWm, ; 

„ black „ ), 2^1> 2^2» • • • 2^«nf » 

and so on ; then all possible arrangements are represented by the in- 
dividual terms of the product — 

(a^i + 2«'2 + + 2«'«n, + 2^1 + 2^2 + • • • + 2^",) X X « factors; 
e.g.^ the term jO/i . ^^ . g^g . ^w/^ . . . n^m, represents one group. A w 

will occur r times and a ^ the remaining « - r times as often as the 
term itf ^~' occurs in the binomial expansion of (m^ w-\-m<^ by [where 
all the w*s are put as a/, and all the ^'s as b\ The coefficient of itf 6^"' 
in this expansion is "C, . Wj' . wig""'. Total number of possible arrange- 
ments is m\ Hence required chance is — 



*C, . m^^ . m^^^ __ n 



m'' 



-^ ■ (5)' (2)"- ■'-'<'■ f"- 



E.g,y to find the probable number of sixes in n throws of dice. 
Here/» = 6,/»i=i, «2 = 5,/ = i, ^ = |. 



Probability of r sixes = "C, ^iV /|)""' 



Suppose « 


= 


12. 


Total number of 


possible 


arrangements is 


612= 1 


2,176,782, 


336. 








- 






12 


sixes occur 


12^2- ^^^-5^ times « . 


1 




II 








r ill ci 


»> 


60 




10 








12^1Q.I .5 


»> 


1,650 




9 








12^9** 


)) 


27,500 




8 








r 18 c* 


»> 


309,375 


» 


7 








r t7 c6 

12^^7.1 .5 


»> 


2,475,000 




6 








12^6*^ .5 


n 


14,437,500 




5 






, 


12^5*'^ 'J 


l> 


61,875,000 




4 








12W*^ .5 


}) 


193,359,375 




3 








r t3 c^ 

12^^ 


»> 


429^687,500 


« 


2 








12^2*^ .5 


)) 


644,531,250 




I 








12^1** 


»> 


585,937,500 













12^0** 


n 


244,140,625 


• 


^ 








Total - 


2,176,782,336 


- 



EQUATION OF THE CURVE OF ERROR. 275 

The most probable number of sixes is 2, of which the chance is 
about f . In four-fifths of the trials there will probably be i, 2, or 3 
sixes. 

E,g,^ to investigate whether drunkenness occurs chiefly on night of 
pay-day (suppose Saturday). If maximum number of convictions in a 
week is on Saturday in 10 weeks out of 12 selected at random, we have 

an event whose probability is only ^ x ^ = (about), 

2176782336 1300000 

if the position of the day in the week had nothing to do with it. 

It must be noticed that the probability that event will occur 10 times 

out of 12 on any the same week-day is much greater, viz., . 

Probability that event will occur at least 10 times is ^-,^ ^^ 

« ]r^ — ;: which is much the same as before. 

2176782336 

Similarly any questions depending on the occurrence of an event in 
the same month may be worked out. In this case »^i=i, ni^^^ii^ 
n = number of years investigated. 

If a bag contains m balls of different colours, m^ white, m2 red, 
m^ green, &c., the probability that r^ white, rg red, r^ green,. &c., will 
occur is coefficient oip{x p^* p^^ ... in expansion of (/^ -H/g + A + )° 

by the multinomial theorem, where p^ = -^, p^ = —2, &c. 

PI m 

Notice that the probability of an event, if it was a chance 
occurrence, is not the same as the probability that the event was 
a chance occurrence. 

If 13 trumps appeared in the same hand, we could not say 
that the chances were (siCi2 = ) 158,753,389,900 to i that the 
hand was "faked," but we should have strong though incom- 
mensurable evidence on the point. 

Deduction of Equation of Curve of Error. 

* We can now proceed to the determination of the equation of 
the curve of error. 

The chance of r successes is greatest when r is the greatest integer 
in pn \ this is found by the ordinary method of determining the maxi- 
mum term in a binomial expansion. 

Let P be this maximum value = "Cpn. /"*" ^**", making the supposition 
for brevity that pn is integral, which will not affect the proof. 

\n 
- , ~7 p^ g'^'',ioxpn-\-gn=^n, 
ipn I qn 



276 ELEMENTS OF STATISTICS. 

Let P, be chance oipn + x white balls. 

Then P.- P X f^Vx gn. (gn-i) . . . {gn-x ^ i) 

\g) {pn + i) (/« + 2) . . . {pn 4- x) 

\ gn) \ gn) ' ' ' \ gn ) 

Taking logarithms of both sides — 
log p. - log P + log (x --L) +log (x -^^) +. . . + log (x -^) 

-iogp-(i+i.^^+)-(j.+i.(^y+)-... 

\gn 2 {gnf J \gn 2 \gnj J 

_ (£zi + i (£zi)' + ) 

\ gn 2 \ gn / J 

\pn 2 {pnf ) \pn 2\pn) J "' \pn 2\//// / 



/« 2/2«2 

= loe p_'^('^~0_'^('^'*'^)_ (-^~^^)"^-(^-^-^) . ' ^(^+ l)(2 Jt:+ l)_ 
2^« 2//Z \2qh^ \2phi^ 

Now when «, the supposed number of bags, is very great, pn and 
qn are also very great if neither p nor g are very small ; Xy the diver- 
gence from/«, ranges from O to gn on the positive side of the maximum, 
and O to -pn on the negative side. The chance of so great a diver- 
gence as -pn or gn is very small. The chance of a small divergence, 
such as ^= I, 2, 3 . . . is very nearly equal to the maximum chance P. 
For instance, if ^ = 3 — 

P = P X i^y^ X gn{gn-i){qn-2) 

^ \g/ (/« + i) {pn -H 2) (pn + 3) 

. P (x-L)(x-^)(x+i)-Yx+-iyYx+J-)- 
\ ng/ \ ngj \ pn) \ pn) \ pn) 

= .(^^^nding jeajch term) P x i--2__ — + terms involving i • 

: .'. - - \_ ng np n^\ 



»o 



•Of 



H 

< 

o 

H 

H 
< 

W 

o 

H 



< 

w 
> 

u 

•J 
< 



o 

t— • 

w 

H 

O 

H 

>^ 
c/2 



o 
•I 



N 



o 




N 



E 

u 



(O 



N 



s>- 







-\ 



To/uepage 277 



EQUATION OF THE CURVE OF ERROR. 2/7 

Hence, in order that P, may have at once a finite value and one 
with a finite' divergence firom P, x must be very great compared with 
unity, but small compared with «. 

Re-write the above equation for Pj^, neglecting f -), 

x^ 
If _- were negligible, log P, = log P. 

ft 

If - were finite, and therefore x infinite, both — _ and — _ would be 
n n^ fir 

infinite. 

That part of the resulting curve, which shows finite curvature, is 

x^ 
found by assuming that x and ft are infinite, but — finite. 

fi 

[The general argument is similar to that used for constructing the 
finite part of a parabola on a finite scale, for there ^ is finite.] 



On this hypothesis -5={ — ).-, -^=( — 1. -, and these and pre- 

n^ \n / X n^ \n J n 

sumably further terms are infinitesimal. The equation of the finite 

part of the curve is therefore — 



x« 



or P, = V.e "P^, since/ + ^ = i. 



x« 



Writing;^ for Px,;' = Ptf "p**. 

The curve is horizontal near the maximum ordinate, for P, = P when 
X is small, and extends to infinity in both directions, the axis of x being 
an asymptote, for when x = ±^^,yis zero. 

When -s- is negligible, the curve is very approximately symmetrical ; 
fi^ 

this symmetry may be shown to extend over the finite part of the 

curve, when ti is large.* The annexed diagram illustrates the extent of 

the asymmetry for a small value of ft. 



* By considering the values of the various quantities in relation to the 
table on p. 281. 



278 



ELEMENTS OF STATISTICS. 



Relation of Curve of Error to Statistics. 

The following example shows how Quetelet fitted his figures 
to given observations : ♦ — 

Chest Measurements of Scotch Soldiers. 



I. 

Chest 
Measure- 
ment, 
Inches. 


a. 

No. of 
Men. 


3- 

Pro- 
portional 
Nos. 


4- 

No. 
between 

given 
Measure- 
ment and 

Mean. 


5- 

Rankin 

Scale of 

Precision. 


6. 

Calculated 
Rank of 
Measure- 
ment. 


7- 

Precision 
of Cal- 
culated 
Rank. 


8. 
Calculated 

No. of 
Observa- 
tions to 

each 
Measure- 
ment. 


9- 

Differ- 
ences 
between 
Columns 
3 and 8. 


33 
34 

36 
37 

39 { 

40 

41 
42 

43 
44 
45 
46 

47 
48 


18 
81 

185 
420 

749 
1,075 

• • • 

1,079 

934 
658 

370 
92 

50 
21 

4 

I 


5 

31 
141 

322 

732 
1,305 
1,867 

• • • 

1,882 
1,628 
1,148 

645 
160 

87 
38 

7 

2 


5,000 
4,995 

4,964 
4,823 

4,501 

3,769 

2,464 

1,285 

2,913 
4,061 

4,706 

4,866 

4,953 
4,991 
4,998 
5,000 


• • • 

+ 52 
42.5 

33.5 
26.0 

18.0 

10.5 
+ 2.5 

-5.5 

13 

21 

30 

35 
41 
49-5 
-56 

... 


• • • 

50 

42.5 

34.5 
26.5 

18.5 

las 

2.5 

5.5 
13.5 

21.5 

29.5 

37.5 

45.5 

535 
61.8 

• •• 


5,000 

4,993 
4,964 
4,854 
4,531 

3,799 
2,466 

628 

1,359 
3,034 
4,130 
4,690 

4,9" 
4,980 

4,996 

4,999 
5,000 


7 
29 

no 

323 
732 

1,333 
1,838 

ate 

1,987 

1,675 
1,096 

560 

221 

69 
16 

3 

I 


2 
2 

31 

I 


28 
29 

• • • 

105 

47 
52 

61 
18 
22 

4 

I 


• • • 


5,738 


10,000 


*. . 


*** 


• • • 


• •• 


10,000 


488 



The chest measurements of S,73S soldiers were ranged in 
order of magnitude, and the numbers of men at each measure- 
ment placed in column 2 against the corresponding number 
of inches in column i. Column 3 gives numbers proportional 
to those in column, such that their sum is 10,000. It is assumed 
that there are 5,000 cases on each side of the (unknown) mean ; 
then 5,000 cases occur between 33 inches and the mean, 4,995 
between 34 inches and mean, and so on, till we find (in column 4) 
597 cases between 39 inches and mean. Similarly 1,285 occur 
between 40 inches and the mean. 

Referring now to the scale of precision, we find that 4,995 
cases corresponds to rank 52 ; 4,964 to rank 42.5, and so on. 
The numbers of these ranks are placed in column 5. 

Now if the observations fitted the curve exactly the distances 

— — I - 

* Jdtii,y p. 40a 



EQUATION OF THE CURVE OF ERROR. 2/9 

between two ranks corresponding to two successive inches should 
be always the same. This is not exactly the case : 34 inches 
corresponds to rank 52 ; 35 inches to 9.5 ranks lower, 42.5 ; 
36 to 9 ranks lower, and so on. It is necessary to assume 
some regular interval, which will show as close a correspondence 
as possible between theory and observation. It is assumed that 
a difference of I inch corresponds to 8 ranks, and column 5 is 
" smoothed " into column 6 on this hypothesis. The process is 
then reversed ; against each rank in column 6 is placed in 
column 7 the corresponding number from the scale of precision. 
Column S is then calculated from column 7 in the reverse way to 
that by which column 4 is reckoned from column 3. The close- 
ness of the resemblance between columns 8 and 3 shows how 
nearly the measurements fit the theory. In column 9 are placed 
the differences between the numbers in columns 3 and 8 ; the 
percentage which the sum of these differences is of the total 
number (10,000), is a measure of the fit. If the observations are 
plotted out in the same way as the later figures which form the 
Lj, Lg . . . figure on the diagram at the end of the book, this 
ratio of the sum of the differences to the whole number, is a 
rough measure of the ratio of the sum of the areas included 
between the lines L^ Lg . . . and the curve of error A^AgAj. 
In the case just discussed the misfit is 4.88 per cent. 

The following seems a better method of. estimating the fit, 
for it is less dependent on the accidental divergences caused by 
the particular interval of measurement taken. Construct a figure 
to represent the scale of precision on p. 273 ; fit as closely as 
possible to this another figure, whose ordinates represent the 
numbers " at or above *' given measurements represented by the 
abscissae ; the whole curve will be nearly the shape of the figure 
facing p. 155, when the smoothed curve may stand for the scale 
of precision, symmetrical about its median, and the original 
jagged line for the observations. The closeness of the jagged to 
the smoothed line shows the fit ; and this will not be altered by a 
slight shifting of the inch or shilling limits we adopt, which in 
the other method often makes a great difference to the regu- 
larity * in discontinuous observations. Moreover, it is not 

* E.£',j two great numbers at 29s. gd. and 30s. 3d. respectively will both 
be in the same group if our limits are " 29s. 6d. and not so much as 30s. 6d.," 
&c., but in different groups if the limits are "29s. and not so much as 
30s.," &c. 



280 ELEMENTS OF STATISTICS. 

necessary by the latter method to sort the observations into 
groups at all. 

We cannot deduce this equation from the most general 
hypotheses (stated on p. 303, infra) without the use of the 
integral calculus. It is, however, more convenient to use a scale 
of precision evaluated from the equation of the curve as obtained 
in other ways. In the next table the numbers under x corre- 
spond to Quetelet's "ranks," but the divergence taken as unity 
corresponds to the quantity >j2pqn which is rank 22.35 • • • 
(for V2 X J X i X 999 = 22.3 5 . .) in Quetelet's scale. The quantities 
under F(;t:) correspond to those in the earlier scale of precision ; 
so that against any value of x is found the chance that an 
observation shall be between the average and x. The figures 
are adapted from Lexis' Massenerscheinungeity pp. 93, 94, and 
Quetelet, ibid.y p. 389. This table and Quetelet's can be used 
indifferently ; they yield very nearly the same results. 

The diagram at the end of the book shows the relation of 
the binomial with index, 4, 10, and 32, to the curve of error. 



Note. — For the evaluation of large factorials met with in work of this 
sort, Stirling's approximate formula is useful. This is — 

{tll-i) ! =»r-*.(27r)*.<?-"'+i'«m-'i«m* 

The following adaptation will serve for four significant figures :— 
log,o«!«(«+i)logio(«+i)-.o352+(-« + r^+i))x.43439 



EQUATION OF THE CURVE OF ERROR. 



281 



Values of F(^) for Different Values of x, where 



X. 

.00 


F(x). 


X. 

.36 


F(x). 


X. 


F(x). 


X. 


FW 


X, 


Fix). 


.000 


.195 


.71 


.342 


1.06 


.433 


I.4I 


.477 


.01 


.006 


•37 


.200 


.72 


.346 


1.07 


.435 


1.42 


.478 


.02 


.Oil 


.38 


.205 


.73 


.349 


1.08 


.437 


1.43 


.478 


.03 


.017 


.39 


.209 


.74 


.352 


1.09 


.438 


1.44 


.479 


.04 


.023 


.40 


.214 


.75 


.356 


1. 10 


.440 


1.45 


.480 


.05 


.028 


.41 


.219 


.76 


.359 


I. II 


.442 


1.46 


.481 


.06 


.034 


.42 


.224 


.77 


.362 


1. 12 


.443 


1.47 


.481 


.07 


.039 


.43 


.228 


.78 


.365 


I.I3 


.445 


1.48 


.482 


.08 


.045 


.44 


•233 


.79 


.368 


1. 14 


.447 


1.49 


.482 


.^ 


.051 


.45 


.238 


.80 


.371 


I.I5 


.448 


1.50 


.483 


.10 


.056 


.46 


.242 


.81 


.374 


1. 16 


.450 


1.52 


.484 


.11 


.062 


.47 


.247 


.82 


.377 


I.I7 


.451 


1.54 


.485 


.12 


.067 


.48 


.251 


.83 


.380 


I.18 


.452 


1.56 


.486 


.13 


.073 


.49 


.256 


.84 


.383 


1. 19 


.454 


1.58 


.487 


.14 


.078 


.50 


.260 


.85 


.385 


1.20 


.455 


1.60 


.488 


.15 


.084 


.51 


.265 


.86 


.388 


I.2I 


.456 


1.62 


.489 


.16 


.090 


.52 


.269 


.87 


.391 


1.22 


.458 


1.64 


.490 


.17 


.095 


.53 


.273 


.88 


.393 


1.23 


.459 


1.66 


.491 


.18 


.100 


.54 


.277 


.89 


.396 


1.24 


.460 


1.68 


.491 


.19 


.106 


•55 


.282 


.90 


.398 


1.25 


.461 


1.70 


.492 


.20 


.III 


.56 


.286 


.91 


.401 


1.26 


.463 


1.72 


.493 


.21 


.117 


.57 


.290 


.92 


.403 


1.27 


.464 


1.74 


.493 


.22 


.122 


.58 


.294 


.93 


.406 


1.28 


.465 


1.76 


.494 


.23 


.128 


.59 


.298 


.94 


.408 


1.29 


.466 


1.78 


.494 


.24 


.133 


.60 


.302 


.95 


.410 


1.30 


.467 


1.80 


.495 


.25 


.138 


.61 


.306 


.96 


.413 


I.3I 


.468 


1.82 


.495 


.26 


.143 


.62 


.310 


.97 


.415 


1.32 


.469 


1.84 


.495 


.27 


.149 


.63 


.314 


.98 


.417 


1.33 


.470 


1.86 


.496 


.28 


.154 


.64 


.317 


.99 


.419 


1.34 


.471 


1.88 


.496 


.29 


.159 


.65 


.321 


I.OO 


.421 


1.35 


.472 


1.90 


.496 


.30 


.164 


.66 


.325 


I.OI 


.423 


1.36 


.473 


1.92 


.496 


.31 


.169 


.67 


.328 


1.02 


.425 


1.37 


.474 


1.94 


.497 


.32 


•'25 


.68 


.332 


1.03 


.427 


1.38 


.475 


1.96 


.497 


.33 


.180 


.69 


.335 


1.04 


.429 


I..39 


.475 


1.98 


.497 


.34 


.185 


.70 


.339 


1.05 


.431 


1.40 


.476 


2.00 


.498 


.35 


.190 










1 


2.05 


.498 



X. F (.r). 

2.20 .49907 

2.50 .49980 

3.00 .499*989 

4.00 .499.999*992 

5.00 .499,999.999>999.2 



Special Points on the Curve. 

Before we can show how to fit observations to this table, we 
rnust consider the equation of the law of error more closely. 

Definition. — Th^ probable error of a series of observations 



282 ELEMENTS OF STATISTICS. 

is that divergence from their mean on either side, within which 
exactly half the observations lie. This quantity is more appro- 
priately called the Quartile Deviation,^ 

In the scale of precision the corresponding number is .25, 
either above or below the mean ; the corresponding rank in 
Quetelet's scale is about 10.7. In the table just given, the value 
of F(;r) which corresponds to the probable error is .25, and the 
corresponding value of x is calculated to be .47694, a quantity 
usually designated by p. 

An approximation to the probable error for a given series 
of observations is obtained by arranging all the observations 
in order of magnitude ; marking the magnitude, say «, above 
which 25 per cent., of the observations lie, and the magnitude, 
say ^, below which 25 per cent. lie. Half the difference between 
a and ^ is the probable error. 

A useful way of illustrating this is to say that if one observa- 
tion is chosen at random out of a group, it is as likely as not that 
it will lie not further from the average than the probable error. 

In the figure given at the end of the book, the probable 
errors are at the points P^, Pg for the curves A and C, and /i, p^ 
for the B. 

By means of the approximation given in the last paragraph 
but two, the curve of error can be fitted to a series of observa- 
tions by equating the probable error so determined to the. value 
of ;r =.47694, and comparing the values of F(;r) with the ranged 
observations ; but though this method is simple and rapid it is 
not the best. 

By a suitable change of scale for ordinate and abscissa the 
equation given on p. 277 can be written ^ = ^""*^ and this is the 
most general equation of the normal curve of error. 

If;ir=^,j=i ; hence the unit of ordinate is the number of 
cases at zero error; from the table on p. 281 above, the unit of 

/+«»_ 3 
e ^ ,dx is 
-00 

shown in the integral calculus to be ^^, the area contained between 
the curve and the axis of x is Ji, If the equation is written 

y'^'TE'^''^ ^*s area is i, that is, unit of area equals unit of 
probability (that is certainty) and the area contained by any two 



See Yule in Statistical Journal^ 1896, p. 330. 



EQUATION OF THE CURVE OF ERROR. 283 

ordinates, the curve and the axis of abscissa equals the probability 
of an occurrence between the errors represented by the abscissae 
of those ordiilates.* 

If the curve is traced from either of the tables, it will be 
found that it changes from concavity to convexity on each side 
of the maximum ordinate, at such points as s^ s^ s^ s^ in the 
diagram at the end. If the unit of abscissa is taken as a concrete 
quantity, say, I inch, and the abscissa (OSg) of this point of in- 
flexion is €, in the same units, then the equation of the curve is — 

I -4 , , d^y -^, /^2 i\ = O, if a: = ± €, and 
y - -j=.e -», for then ^^ - ^ "! (ir-,l) 

therefore the points ( -^> ~7^) ^^® points of inflexion. 



X* 



Let 2€^ = {^='~ then the equation is^ = — -,e *^' or;' — — ze-«*i»* 

+00 _x^ 

The area of this curve is f -^ ,e ^- dx=c, 

J Jtt 



—00 



Choose the unit of ordinate so that this area shall be unity. 
Then the equation is — 



I -~ h .-h 



— tf 



c« ^ Jl-.e 



•x« 



Cy which thus determines the unit of abscissae of the curve, is 
called the modulus. 

Determination of the Modulus. 

Suppose we have a series of observations which we know are 
selected from a group which conforms to the law of error, it is 
required to find, from the observations, the centre and modulus 
of the curve from which they would come with least impro- 
bability. 

Let ar^, jVg, . . . x^ be the observed values. Let x be their arith- 
metic mean. Let 6j, 83 ... be the divergencies of the values from 
this mean. 



♦ Hence chance of error between x and x-^dx is —3 .e^^^^dx. 



284 ELEMENTS OF STATISTICS. 

Then S^ ^ x^ - x^ 8^ ^ X2 - Xy . , . ; ^28 » 2^^x - nx—o, for 

- 2 "^ 
X = -* — . 
n 

Let the equation of the curve to which the observations belong be 

_ (x-k)« 

v=" :::.. e ^ , where c and k have to be determined. 

Let ^p J'2» • • • Jo • • • A be the values of y corresponding to 

^V ^2* • • • ^r • • • "^n* 

Then y^Jx is the chance that an object taken at random from^the 
group conforming to the curve shall be between x^ and x. + dx* 
Let P be the chance that the n given observations occur together. 
Then P = j'l- ^2 • • -A 

. («|-^)' (««~^)' (xa-k), 

I c« ~ c« ~ — i — 

= 7r-n<? Xtf X...X x^ *^" 

2? (x-k)« 



(^v^r 



e "" 



Now 2:^(^->^)2 = I?l(8 + x~k)^ = 2;;S2 + 2.(r - >&).25 + «.(^- >&)« 
= 252 + «(^-,&)2^ since 25 = O. 

Whatever value we assign to c, P will be greatest when the quantity 
(x-ky is least, that is when k = x. 

Giving k this value, 



-1^ ,e <^' 



dP . -n-x -n-3 _5 _??! V 

In order that P may be a maximum, ^ must = O.t 

Hence ^=£^' + 
Thus the curve required has its centre at the arithmetic 



* When the magnitude of the observations is discontinuous, as in 
Quetelet's scale, no dx is necessary ; but if the magnitude is continuous the 
probability of any defined error is zero, and the y is the phance of an error 
between infinitesimal limits. 
^P 
^^^ "ZS" ^® negative, which can be shown to be the case here. 

t The proof here given is based on Merriman's Method of Least Squares; 
but it is suggested that his statement " for a given system of errors, it must 
be considered that the observations have been as precise as possible," § 65, 
is unnecessarily obscure. 



EQUATION OF THE CURVE OF ERROR. 285 

mean of the observed values and modulus equal to ^/^ — 

where the S's are the differences between the observed values 
and their mean.* 

^, so determined, is called the modulus, A=- is called the 
' ' c 

precision. Professor Edgeworth proposes to call ^ = the 

fluctuation,^ 



xa 



It can be shown that half the curve j/ = — -j^e ^' is included 

^ VTT 

between the ordinates corresponding to ±r, when r=. 47694 c, 
as found from the scale of precision. 

It can be shown that the arithmetic average (v) of all the 
errors, considered positive, J is given by — 

?; = — p, whence r = .8453 rf, 

rj is the abscissa of the centroid of the positive half of the 
curve. 

r) is more easily calculated from the observations than c, and 
can in some cases be used in its stead. § iy is called the average 
error or mean of errors. 

In the table given on p. 281, the modulus is taken as I ; 
when x=- 1, F (;ir) = .42i . . ; that is .421 . . of the curve lies between 
the ordinate corresjponding to the abscissae equal to the modulus 
and the central ordinate. When ;r = 2, F (;ir) = .4976 . . Hence the 
chance of an observation showing a divergence from the mean 
on either side of more than twice the modulus = .005 . . ; the 
corresponding rank in Quetelet*s table is 45.1. 

€ used on p. 283, is now seen to be equal to ^ — and is 

called the error of mean square^ or the standard deviation. If 

* See also p. 307, infra, 

t Stat, Journal : Jubilee Number, p. 188 ; and p. 298, infra. 

t /•* — _f_ P^ 

"X^^ * ■ '^—^-^ 

§ Stat, JoumcUy loc, cit, 

IT Both € and ^ have been called mean error^ and this term has become 
misleading. 



286 ELEMENTS OF STATISTICS. 

In the diagram at the end OPj, OPj are the probable errors, 
OMj, OMg the moduli, OSp OSg the errors of mean square 
and OEj, OEg the mean errors for the curves A and C. O/i, 
0/2> Om^y Om^ O^i, Oe^ are corresponding quantities for the 
curve B. It is to be noticed that the line XOX^ is an asymptote 
of each of the three curves drawn. In the figure OG5 equals 
about twice the modulus. The ratio of the small area to the 
right of a vertical line through G5 to the area of half the curve is 
the chance that twice the modulus shall be exceeded. The 
distance between OXi at three times the modulus and the curve 
is too small to be shown. 

From the foregoing it will be seen that any curve of error, 

/ = — ■T=e " ^* > can be obtained by projection from the same 

standard curve ^=^"*', just as any ellipse can be obtained by 
projection from a circle ; but as ellipses differ from one another 
in virtue of different values of their eccentricity, so curves of 
error differ from one another in virtue of different values of their 
modulus. As we have seen, on any such curve there are certain 
definite points (the positions of the modulus, mean error, pro- 
bable error, and error of mean square) ; if then we have the same 
units of abscissae, such as I inch, for two sets of observations, 
these points will take different positions. If for one set the 
modulus is 2 inches, and for another i inch, .843 of the observa- 
tions will be within 2 inches of the mean in the first case, and 
within I inch of the mean in the second. If we regard the 
obsenvations as attempts to hit the mean and the divergencies 
as errors, the aiming in the second case is twice as precise as in 
the first. The precision thus defined is in inverse proportion 

to the modulus, and is therefore suitably measured by A = -. 

If the standard form of equation is adopted in both cases, the 
area of both curves will be unity, and their actual shapes those 
of Ci Cg C3 and B^ Bg B3. 

The calculation of the precision of a set of observations does 
not require either of the tables, but is simply the evaluation of 

// = ^ -y^ from the observations themselves. 

Another form of the equation in common use is j/ = — j^e ^* 

^ s/tt 

where n is the whole number of observations in question. The 



EQUATION OF THE CURVE OF ERROR. 



2&/ 



area of this curve is «, so that unit area corresponds to one 
observation. 

Compare now this form with that obtained from the limit 

of the binomial expansion, _;/ = P.^ ""anw 

If;r=<7,^=P; hence P is the maximum ordinate. 
Adjusting the unit of ordinate so that the area of the curve 



n 



e 2«>w 



Hence 2npq = (^y and therefore 2«/(i — /) = ^. 



Examples. 

I. The following figures, which are taken from Professor 
Westergaard*s Die Grundzuge der Theorie der Statistik^ but 
treated in a manner different from his, will serve to illustrate the 
meaning of the formulae, and show how to fit a curve to observa- 
tions ; limits of space prevent more elaborate examples. 



Births in Denmark. 



Year. 


Number. 


Percentage Boys 
of Total. 


Difference from 
Average. 


Total. • 


Boys. 


i860 
1861 
1862 
1863 
1864 
1865 
1866 
1867 
1868 
1869 
1870 
1871 
1872 

1873 
1874 

1875 
1876 

1877 
1878 

1879 


54,797 
53,747 
53,011 

53,939 
52,884 

55,434 
57,353 
54,763 
56,546 
54,056 
56,472 
56,407 

57,274 
58,616 

59,324 
61,791 

63,967 
63,772 

63,144 
64,363 


28,308 
27,506 
27,300 
27,841 

27,334 
28,483 

29,747 
28,036 

28,985 

27,577 

29,144 

29,045 
29,462 

30,115 
30,594 
31,784 
32,912 
32,508 

32,505 
33,114 


51.66 
51.17 
51-50 
51.62 
51.68 

51.38 
51.87 
51.20 
51.26 
51.02 
51.60 

51.49 

51-44 

51.37 

51.57 
51.44 

51.45 
50.98 

51.48 

51.45 


+ .23 
-.26 
+ .07 
+ .19 

+ -25 
--05 , 
+ -44 
-.23 
-.17 
-.41 

+ .17 
+ .06 

+ .01 

-.06 

+ .14 
+ .01 
+ .02 

-.45 
+ .05 

+ .02 


- Average - 


57,583 


... 


51-43 


• *• - 



288 



ELEMENTS OF STATISTICS. 



Calculated modulus J {2 x 57583 x .5143 x 4857} = 169.61 for 
a total of 57,583. Equivalent to .2945 ... for a total of 100. 

In the formula V2/^«, p the chance that any child born is a 
boy is taken as .5143, since 51.43 is the average percentage male 
births are of total births. ^=1— / = 4857. The number of 
experiments n is taken to be the average number of births per 
year. Then Jzpqn is found to be 169.61. This is the modulus 
to apply to the whole number of births ; but since this differs 
year by year it is convenient to reduce all numbers to per- 
centages. The examples are then arranged as " between average 
+ modulus and average — modulus/' " between average + ^ of 
modulus and average — y*^ of modulus," and so on ; the first 
group ( + .460) is taken so as to include the extreme. 









Calculated. 


Within. 


Observed. 






X. 


F (JC)X2. 


51.44 ± .460 


20 


• • • 


.972 of 20=19.4 


51.44 + .295 


17 


I. 


.843 n =16.8 


51.44 + .^265 


17 


.9 


.797 // =15.9 


51.44 ±.236 


15 


.8 


.741 n =14.8 


51.44 ± .206 


13 


.7 


.678 It =13.6 


51.44 ±.177 


12 


.6 


.604 It ;=I2.I 


51.44 ±.147 


10 


.5 


.521 u =10.4 


51.44 ± .118 


9 


.4 


.428 11 =8.6 


51.44 ± .088 


9 


.3 


.329 // ^ = 6.6 


51.44 ± .057 


6 


.2 


.223 1, = 4.5 


51.44 ± .029 


4 


.1 


.112 /; = 2.2 



The numbers to be expected from theory are given along- 
side ; the fit is fairly close, and can be tested by the method 
described on p. 279. 

The modulus calculated by the formula . / ^^^— is .305. 

II. If a digit is taken at random, the chance that it will be 
less than 5 (o, i, 2, 3, or 4) is |. If we take a book of logarithms 
and note the digits in the 7th decimal places in successive 
numbers, we shall have a practically random selection of digits. 
If we take groups of 50 numbers, the chances of o, i, 2 ... 50 
occurrences of digits less than 5 are given respectively by the 
terms of the expansion (^ + i)'". This experiment was repeated 
300 times and the. results tested iaaccordan^ce with both .the 



EQUATION Ot" TliE COrVE OF feRROk. ^89 

fdfmutaEnbr the "modulus^ viz., J2pqn =J2 x J x~J x 56 = S,~aii'd 

^^ , which was found to be 4.75. 
n 

The values of the other standard errors were both found 
directly and deduced from c, with very fair correspondence be-' 
tween the two methods. 



^^ 



Number of Occurrences of o, i, 2, 3, or 4 in the 7TH 
Decimal Places of Groups of 50 Logarithms. 





















Averages of • 


, 


















Lines. 


29 


19 


25 


25 


22 


28 


16 


23 


22 


27 23.6 


24 


28 


30 


22. 


20 


27 


24 


24 


27 


22 24.8 


27 


26 


28 


21 


21 


22 


22 


27 


25 


25 24.4 


2'> 


28 


21 


23 


22 


23 


27 


27 


25 


25 24,6 


28 


23 


26 


23 


22 


29 


28 


25 


23 


26 25.3 . 


24 


22 


22 


19 


26 


24 


26 


28 


20 


25 23.6 


24 


23 


27 


29 


26 


21 


26 


31 


23 


27 25.7 


26 


26 


30 


25 


25 


24 


29 


25 


21 


27 25.8 


30 


24 


25 


27 


24 


30 


24 


28 


24 


30 26.6 


26 


21 


21 


22 


31 


28 


26 


26 


26 


33 26.0 


19 


25 


26 


34 


21 


28 


21 


29 


19 


23 24.5 


29 


26 


19 


29 


24 


27 


25 


25 


24 


22 25.0 


24 


27 


21 


23 


25 


21 


26 


28 


25 


27 24.7 


25 


28 


29 


30 


28 


27 


28 


23 


25 


26 26*9 


30 


30 


18 


22 


24 


23 


25 


27 


25 


31 25.5 


25 


26 


27 


21 


23 


25 


24 


20 


25 


22 23.8 


28 


24 


20 


18 


25 


19 


25 


30 


29 


25 24.3 


22 


27 


24 


28 


22 


20 


23 


25 


26 


26 24.3 


16 


27 


28 


27 


23 


20 


29 


26 


20 


24 24.0 


26 


28 


28 


23 


21 


24 


25 


21 


14 


28 23.8 


25 


24 


21 


24 


24 


21 


24 


30 


25 


26 24.4 


28 


32 


17 


23 


29 


24 


22 


33 


29 


29 26.6 


25 


26 


26 


27 


22 


20 


24 


26 


•24 


24 24.4 


28 


26 


25 


25 


29 


25 


24 


22 


26 


25 25.5 


22 . 


30 


22 


27 


25 


27 


27 


27 


28 


21 25.6 , , 


35 


26 


23 


26 


31 


28 


26 


22 


22 


29 26. s ■ 


26 


20 


28 


23 


22 


28 


24 


23 


30 


16 24.0 


28 


26 


25 


31 


27 


28 


22 


26 


30 


19 26.2 


26 


27 


27 


18 


25 


24 


27 


22 


30 


30 25.6 


26 


25 


24 


17 


27 


32 


25 


21 


23 


30 25.0 








Averages of Columns. 










25.9 


25.7 


24.4 


24.4 


24.5 


24.9 


24.8 


25.7 


24-5 


25-7 



tgo 



feLEMENTS O^ STAttStlCS. 



I« 


•• 


3* 


4- 

Squares of 

Error. 


Product ot Col. a 


Product of Col. a 






Enon. 


and Col. 3. 


and Col. 4. 




Times. 










14 occurs I 


-11.04 


121.88 


-11.04 


121.88 


X6 a 


3 


- 9.04 


81.72 


-27.12 


245.16 


17 " 


a 


- 8.04 


64.64 


-16.08 


129.28 


18 IT 


3 


- 7.04 


49.56 


-21.12 


148.68 


19 It 


7 


- 6.04 


36.48 


-42.28 


255.36 


20 » 


9 


- 5.04 


25.40 


-45.36 


228.60 


21 u 


18 


- 4-04 


16.32 


-72.72 


293.76 


22 » 


26 


- 3-04 


9.24 


-79.04 


240.24 


23 • 


21 


- 2.04 


4.16 


-42.84 


87.36 


24 H 


32 


- 1.04 


1.08 


"33?8 


34.56 


25 -^ 


42 


- .04 





- 1.68 





26 «r 


36 


+ .96 


.92 


+ 34.56 


33.12 


27 • 


30 


+ 1.96 


3-84 


+ 58.80 


115.20 


28 IT 


28 


+ 2.96 


8.76 


+ 82.88 


245.28 


29 ' 


15 


+ 396 


1568 


+ 59.40 


235.20 


30 # 


16 


+ 4.96 


24.60 


+ 79.36 
+ 29.80 


393.60 


31 • 


5 


+ 596 


3552 


177.60 


32 m 


2 


+ 6.96 


48.44 


+ 13.92 


96.88 


33 • 


2 


+ 7.96 


63.36 


+ 15.92 


126.72 


34 ' 


I 


+ 8.96 


80.28 


+ 8.96 


80.28 


35 " 


I 


+ 9.96 


99.20 


+ 9.96 


99.20 








Sum of Errors, all 




••• 


300 


• • 1 


considered positive 785*12 


3387.96 
Sum of Squares. 



Average, 25.04. Median, 25. Quartiles, 27, 23. Hence probable error is 
approximately 2. 

785.12 
Mean error =——--=2.617. 

•.JO 



Error of mean square= . /^^-^^=3.36=€. 



Modulus =€ X V2 = 4. 75 = ^. 

Also probable error =.4769 ^=2.265=/-, and mean error =r+. 8453 =2. 68, nearly 
the values obtained directly. 



The following table compares the distribution with that of 
the normal curve, by a method differing from those previously 
used^ 



EQUATION OF THE CURVE OF ERROR. 



291 



Divergence of i from average corresponds to jc= 



I. 



4.75 



= .21. 



Between 
Average and 


Observed. 


jr. 


aF(*). 




Above 




lielow 








Average 


Average. 




Average. 


Total. 






± I 


36 


+ 


42 = 


78 


.21 


.117 of 600= 70 


± 2 


66 


+ 


74 = 


140 


.42 


.224 n =134 


± 3 


94 


+ 


95 = 


189 


.63 


.314 // =188 


± 4 


109 


+ 


121 = 


230 


.84 


.383 IT =230 


± 5 


125 


+ 


139 = 


264 


1.05 


.431 " =259 


± 6 


130 


+ 


148 = 


278 


1.26 


.463 » =277 


± 7 


132 


+ 


155 = 


287 


1.47 


.481 n =289 


± 8 


134 


+ 


158 = 


292 


1.68 


.491 n =295 


± 9 


135 


+ 


160 = 


295 


1.89 


.496 » =298 


±10 


136 


+ 


163 = 


?99 


2.10 


•499 " =299 


±11 


136 


+ 


163 = 


299 


2.31 


.499 n =300 


±12 


136 


+ 


164 = 


300 


2.52 


.499 " =300 



The fit is close. The symmetry is spoilt by the great 
number 42 at 25 occurrences, just below the average. 

The line L^Lg . . . Lgg on the diagram of the curve of error 
at the end of the book shows these numbers. The moduli are 
made to correspond, which defines the abscissae, and the scale 
of ordinates is then decided by making the areas of the two 
figures equal, but this was not done exactly. 

Summary of Terms. 

For convenience of reference the principal quantities con- 
nected with the curve of error are collected below. 

If we take^ = — -==,e <^' as the equation of the curve, c which 

^ VTT 

determines the unit of abscissae is called the modulus. 

If \ Sg-*"^^® ^^^ differences between the observations and 

/22P 
their arithmetic average, c should be taken as ^ or 

A /-^ — , where n is the number of observations. (See pages 
> «- 1 

28s and 307.) 

If the curve can be also determined as the limit of an 
assigned binomial expansion (/+y)", then c may be taken as 
equal to J2pqn' (See page 287.) 

i— or — -- is called the fluctuation^ and equals A (See 
« «- 1 

pages 285 and 307.) 



29^ ELEMENTS OF STATISTICS. . 

A,=-, IS th'^ precision. 

The square root of the average of the squares of the S's, 

= ^ — , is called the error of mean square^ or the standard 

deviation^ and is represented by the letters € orcr. 

Hence ^=€.^. 

The arithmetic average of all the 8's, all reckoned as positive, 
is called the average error. It is equal to the distance of the 
centre of gravity of half the curve from the central ordinate. It is 

represented by the letter iy, and r\ = —j^=.^6^\ZgiSc, 

The probable error is half the distance between the quartiles 
of the observations. The ordinates through the points whose 
abscissae are the probable errors bisect the two symmetrical 
halves of. the curve. The probable error is represented by the 
letter r,and r=. 4769363c This is generally written r=pr, where 
^=.4769363. 

Cy A, €, r\ and r can all be calculated directly from the observa- 
tions. In general the values so calculated will not satisfy the* 
above numerical relations exactly ; the correspondence depends 
on the closeness of the fit of the observations to the curve. 



\ Vv 



TO WHAT GROUPS DOES LAW OF ERROR APPLY? 293 



Section III. — To what Groups does Law of 

Error apply? 

Returning to our discussion on the relation between the laws 
of probability and the numerical facts of actual experience, let 
The meaning US consider the meaning of such phrases as " a rare 
of inok. occurrence," "an improbable event," "a run of luck," 
" a lucky man," and similar expressions which show that some 
events are regarded as ordinary, others as extraordinary. On 
this subject there is a great deal of popular confusion ; thus 
the Spectator opens its columns to people who write about 
extraordinary coincidences, e,g,^ that on 3rd March in two suc- 
cessive years two persons of the same name died at the same 
age in neighbouring villages ; and recently the concurrence of 
the two names Arthur and Mallory in a dispatch was instanced 
as remarkable. Now, a priori^ these two names are just as 
likely to be mentioned together as any other two borne by 
equal numbers of persons. If out of n persons, / bear the 
one and q the other, the chance that the first two names given 

in an assigned place will be these is about — x - ; but the 

ti ti 

chance that they will occur together in the newspapers in a 
given week is much greater, viz., -^ x N, where N is the 

number of pairs of names in conjunction in all the columns 
of the press together. Going a step further, consider the 
number of pairs of names that, when placed together, would 
recall some event of historic or other interest, and suppose this 
to be M ; then the chance that some such coincidence should 

arise in a given week is -^ x N x M, if we suppose for the 

sake of argument that / and q are the same for all the pairs 
concerned. From these remarks it will be seen that before 
we can speak of an event as extraordinary, we must define the 
time, place, circumstances, and nature of such events. Further, 
suppose we decide to regard an event as unusual if the chance 

of its occurrence thus defined is less than i where r is a large 

number, it is easily seen that we may expect the improbable, 
to spea1< paradoxically; for great though r may be, the 



294 ELEMENTS OF STATISTICS. 

number of events which come under our cognizance is also 
great ; and we may therefore expect to find on an average 
one improbable event for every r we notice ; hence it is possible 
for a weekly newspaper with the help of the widely-extended 
search for sensations of its intelligence department to supply 
us week by week with its quantum of horrors. Another aspect 
of the same subject will be seen when we deal with the per- 
manence and regularity of certain small numbers in Section IV. 
The rarity of an event is often unconsciously determined 
by a mental forecast of its occurrence. If I take four cards 
The idea of out of a well -shuffled pack and find them to be 
'"'^^y- in succession, ace, king, queen, knave of hearts, I 
should feel surprised ; not because these four cards are less 
likely to come than any other four assigned cards whatever, 
but because I have certain associations with them in that they 
form a sequence which is valuable in certain games, and are 
the highest cards of a suit; there are noted in my mind 
unconsciously many groups of four cards of such special sig- 
nificance. If there are s such groups, the chance that one of 

s s 

them will occur is -^pry if we do not, and ^^p-, if we do, regard 

the order of their occurrence. 

The real difference between a rare and a common event 
is, however, independent of any mental process or prejudices. 
If I place 8 coins one after another in front of me, it is no 
more unlikely that I shall get 8 heads than that I shall get 
any other assigned order of heads and tails, say htththtt; 

the chance of either is -g ; but it is much more unlikely that 

I shall get 8 heads than that I shall get 5 tails and 3 heads 
without regard to the order in which they come ; for out of 
(2* = ) 256 possible arrangements, only i gives 8 heads, but 

-p-;= )56 give 5 tails and 3 heads. 

Apply this argument to our hypothesis as to great numbers. 

Suppose the population to be composed of males and females 

The greatness in equal numbers, and that 1,000 persons are 

im robSSut selected on some system quite unconnected with sex. 

dealt with by Out of 2^^( = lo^^^) possible selections (differing 

the law of error, j-j.^^^ another only in arrangement in order of 

sexes) only i gives 1,000 males, but — i^2i-(=5 io»^x 27) arrange- 

* J * 



TO WHAT GROUPS DOES LAW OF ERROR APPLY? 295 

ments gives 500 of each sex, independently of the order. The 

chance of the first occurrence is — ^n * of the second about 

I in 37. In statistics we are concerned with the totals, depend- 
ing only on the combinations of the items, not on their order 
(the permutations) ; and occurrences of the numbers near the 
average (500, 499, 501, &c.) are separately and much more 
conjointly very much more probable than occurrences of the 
numbers far from it. The vast improbability of very great 
divergence can be seen by a numerical study of the curve of 
error (see p. 281). 

Hence the theorems relating to great numbers rest on a 
very much firmer basis than they would if divergence was 
due to that sort of coincidence which produces a so-called 
rare event. 

A " run of luck," good or bad, may be regarded as a suc- 
cession of improbable events, and is a more scientific expression 

A common than a " rare event " as commonly understood. Of 

fauaoy. ^ great number of events, deals of cards, invest- 
ments, bets, and so on, very many will give normal results, 
average success at cards, normal returns to investments and 
so on ; very few will give abnormal winnings. The chance 
of abnormal success in one venture being /, a small fraction, 
the chance of a succession of n successes is /", very much 
smaller when n is at all large. It is in the phrase "lucky 
man" that the error is introduced. One who has benefited 
by the occurrence of a rare event may reasonably be called 
lucky, and the number of lucky men will be roughly proportional 
to the number of fortunate rare events ; but when a succession of 

events, say three, each of probability -^, and conjointly of 

probability , or a broken succession (^.^., ppqpp of 

which chance would be ) has taken place in one man's 

20000000/ 

favour, the imagination loses the logic of the case, and sup- 
poses an overruling law, and marks out that particular man as 
not subject to the law of probability: one \s apt to expect 
that the next event will also be a success, and to be 
further confirmed in this opinion by paying attention to 

* Chance of 1,000 m. or 1,000 f, is twice this. 



296 ELEMENTS OF STATISTICS. 

the one instance when the sixth event is a success, and 
neglecting the ninety and nine when it is a failure. Other 
people are biassed in the opposite direction, and have dis- 
tinctly too great an expectation of a counterbalancing tendency, 
a long run of failures till the average is restored. It is thus 
correct to speak of a man having been lucky, but tempting 
Nemesis to speak of him as a lucky man. It is a mere 
truism to say that, unless a success or failure have some 
causal influence over future successes or failures (as when a 
good stroke at a game steadies the nerves for another), the 
probability of each future event is totally unaffected by what has 
gone before. 

Let us return now to the method of deducing the chance /, 
and the index n^ used in the expansion (p+gy, from records 
statistical such as the death-rate. Notice first that the de- 
ooeffldeiitfl. duction of / (the chance to be applied to each 
individual to find the varying degrees of probability of the 
possible totals) from the numbers, implies some hypothesis 
as to the genesis of these numbers, the very theorem which 
we wish it to illustrate ; for suppose that in the records of 
20 years we find 600,000 deaths in a stationary population of 
1,000,000, we assume that this is the most probable number 
which a chance regime would give, and since the most probable 

number is the total x A we deduce that^ = X — = -^ ; but 

^* ^ loooooo 20 100 

here we are making some undefined assumption about the 
occurrence of events simila'r to that defined by the curve of 
error. If we actually assumed the law of error, we can calcu- 
late how far the value of / so estimated may be expected to 
differ from its true value. This accounts in part for the diver- 
gence between the calculated grouping and the fact. 

Again, there is great difficulty in determining the number «, 
the number of persons to whom the chance of the occurrence of 
a particular event is applied, and we should further notice 
that in many cases, in particular in concrete measurements, 
such as height and normal length of life, we have no infor- 
mation whatever as to «, which in this case is the number 
of causes which may add or subtract undefined units from 
height or age; and we are often equally in doubt when 
dealing with great numbers,, ^,^,^.. with, the _ total, value _ of 
imports, In sych c^s^s we shpuld hj^vq to deduce both ^ 



TO WHAT GROUPS DOES LAW OF ERROR APPLY? 297 

and n from the records of results; and indeed it is simpler 
to fall back on other methods of deducing the law of error than 
the present one of regarding it as the limit of the binomial 
expansion, determining the modulus without any assumption as 
to the number of independent causes. Hence in a great many 
instances we cannot expect to find close conformity to a pre- 
determined curve (/+^)°. Similarly we can deduce from the 
laws of gravitation and motion that a planet's orbit must be 
an ellipse, but cannot determine the eccentricity of this ellipse 
except by observation. 

A far-reaching cause of the apparent discrepancy between fact 
and theory is, however, of a different kind. The theory applies 

to experiments performed under unchanging con- 
apparent non- ditions ; if we are drawing differently coloured 
correspondence balls from a bag Containing a great number, all 

of fact and theory. ^ o t> > 

the external circumstances must be unchanged, and 
the. only variation that which comes from the so-to-say regulated 
randomness of the forces which decide shuffling and drawing. 
Now in human affairs, when we consider a series of death-rates 
or any other rates distributed in time, we are dealing with a 
constantly changing environment of social and sanitary habits, 
within which the apparently random forces that decide death 
are acting ; and these external changes may affect the inter- 
action of the random forces, just as a change in barometric 
pressure may affect the molecular forces of a rigid body. Such 
effects cannot be foretold or calculated ; we may expect that 
improvements in sanitation will diminish the death-rate, but 
some detail may increase it ; vaccination may diminish small- 
pox, but increase the liability to some other disease. To such 
reasons as these should be assigned the non-correspondence to 
the law of error of great numbers distributed in time. When 
the element of time is eliminated by a process of random 
averaging the correspondence is closer. Great numbers distri- 
buted in space are exempt from this disturbing cause and might 
be expected to show closer correspondence ; for instance the 
birth-rates in a number of districts might be expected to con- 
form more closely than rates for one place for a series of years ; 
but it is very difficult to obtain sufficiently homogeneous 
figures distributed in space, though Prof. Lexis gives some 
instances of this kind.* 

* Massenerscheinung^ p. 66. 



298 ELEMENTS OF STATISTICS. 

Physiological and anthropometrical measurements, such as 
the heights of 10,000 children of the same age, are not affected 
by these difficulties, and should show close correspondence with 
the theoretical distribution ; and it is not surprising that the ratio 
of the number of male to the number of female births, depending 
as it does on hidden causes not easily influenced by the progress 
of civilisation, should show that remarkable consilience with the 
law of error, which has so often been remarked. Finally, the 
occurrence of sequences and groups of numbers, such as those 
obtained from logarithmic tables, being absolutely independent 
of changes in time or space, naturally show complete agreement 
with theory. 

All these considerations make the application of the law of 
error to actual measurements a very delicate operation, and it 
The me of the Hfiay appear that the cases where agreement is 
law of error, close are SO few as to make the whole body of 
theory useless; but this is an unscientific view to take. The 
general process of applied science is to frame hypotheses as 
nearly consistent with the facts as is possible without such com- 
plications as will prevent their use, and then apply to the 
idealized case the corrections which the actual cases necessitate. 
This process has led to the best results in physical science. In 
the problems dealt with by the law of error, it will be found that 
many deductions from the idealized cases hold also when applied 
to the only partially corresponding records of great numbers ; 
just as, in mechanics, many theorems relating to smooth bodies 
can be applied unchanged to rough bodies. For instance, the 
" fluctuation " of non-corresponding figures can be calculated by 

2282 

the formula ; and the accuracy of an average of random 

samples of quantities not grouped according to the curve of 
error varies as the square root of the number of samples taken.* 

From this discussion we may gather that we can seldom tell 
d priori whether the law of error will or will not apply to a given 
series of figures. This must be determined by experiment for 
each new class of records ; but when we have found correspond- 
ence in many series of a class (as is the case in measurement of 
heights) we may proceed with confidence to apply the law to 
other similar series or groups. 

An important distinction is drawn by Prof. Lexis f and em- 

♦ See p. 308. t Ibtd,y p. 28. 



TO WHAT GROUPS DOES LAW OF ERROR APPLY? 299 

phasised by Prof. Edgeworth * between two classes of figures to 

concrete mea- "^^^^^ *^ '^^s of great numbers apply. The 
•nrements and first, called by Lexis concrete, contains such quan- 
grea num n. ^j|.jgg ^ height measurements of a great number 

of persons, and normal length of life, where a definite mean or 
type seems to be normal and other measurements to be varia- 
tions from this type. In these cases it is not easy to connect 
the facts which correspond with the exponential curvej' = ^~*', 
where x is the divergence from the type, with our deduction from 
the limit (n infinite) of (J>+qY- Suppose, however, that height 
is determined by n forces, each capable of adding or of subtract- 
ing I unit, say i millimetre, from normal height, and that the 
chance that each shall act is p ; then the divergencies obtained 
in a number of individuals should be distributed according to 
the coefficients in this expansion. 

The other class, called by Lexis combinational^ to which the dis- 
cussion in Section II. above more directly applies, contains those 
totals which are the sum of a great number of items (persons, 
deaths, births, &c.), for the existence of each of which a definite 
chance, /, can be assigned d priori. The numbers may then be 
expected (subject to the disturbing causes already discussed) 



n 



to be grouped in accordance with the curve j'= —^=r e '^", 

^27cpqn 

where n is the total numbers of persons to whom the chance p 

applies. In such cases/ is the arithmetic average of /^/g • • • 

/„, where p^Zy p^Zy . . . p^^z are the numbers of events which are 

found respectively in n series each of z observations. / is the 

" probability coefficient " of the event, and /j, /g* • • • A should 

conform to a curve with modulus __?P(lzA\ On the other 

z 

hand, if ^ is calculated from the formula — 



c^ 



-^{M'K/-a)*---M'}-^ 

we are treating the p's as " concrete " quantities, and obtain a 
second value for the modulus. 

If J^ALz^ - . /^1Z)=0, the distribution of the co- 
efficients /,,/2» ... /n is normal, which is not often the case. 

* Jubilee Volume of Statistical Journal^ p. 191. 



3C» ELEMENTS OF STATISTICS. 

If this quantity <0, the coefficients are grouped more closely 
together than the theory of error leads us to expect, and there 
is some evidence that a force preventing divergence has been 
called into play. 

More generally this quantity is >0, the coefficients are more 
divergent than in accordance with the theory of error, and some 
disturbing forces have acted. 



THE PERMANENCE OF CERTAIN SMALL NUMBERS. 3OI 

Section IV.— The Permanence of Certain Small 

Numbers. 

A remarkable side-light is thrown on our general argument 
by the actual permanence of small numbers. Little attention 

The binomial ^^ '^^^^ given to this phenomenon, but it is a 
ezpansionaiid very Striking fact that if among a great number 

small nnmben. ^f j^^^g ^j^^^^ ^^^ ^ ^^ ^j^j^j^ ^^^^^^^ ^^^^ 

particular feature, it will be found that this small number is 
seldom much exceeded and seldom entirely vanishes. 

The following numerical example shows that this may 
be expected theoretically, and an examination of the successive 
terms of (^+^)" when / is very small and g nearly equal to i 
will show the same phenomenon more generally. 

Constancy of Small Numbers. 

\IOOI 1001/ lOOI*®"' \ / 

1st term = iooo*»o -f iooi^*». 

(jqOq\4(K>0 
j^J = 4000 (3 - 3.0004341) = - 1.7364 = 2.2636 = log .018348. 

Chance of 
No occurrences ist term = a, suppose = .0183 

1 „ 2nd „ = 4000x1000^*' -r 1001**®= 4fl - - • - = .0734 

2 „ 3rd „ ==i°20ii3?9§j<,ooo»»8 ^10014000 

= 4000x(4000-i) ^ joooW*- looi^ 

1.2 
= 8 X 1000*^ -r looi**®, correct to i in 4000= 8a = . 1467 

^ = 4000x3999x3998 ^joo6«^^ ,001^0 

1.2.3 
_ 4000 x(4006«- 3 X 4000) ^i^y^Qi«>7_. 1^14000 
1.2.3 



i» 



_ 4" 



8 



X looo^'^-r iooi*^,less 3 in 4000 approx. = . 1956 

1 1.2, 3 

4* 

4 „ 5th „ =yra, less 6 in 4000 approx. - • • - = .1954 

4*" 

5 „ 6th ,, = 71 a, less 10 in 4000 » - - - - = .1562 

6 „ 7th „ = 75 ^t less 15 in 4000 n - - - " = .1040 

7 „ 8th „ =s J a, less 22 in 4000 „ • * " " = '0593 

o ft 9th ,9 ss j^a, less 30 in 4000 » - * - - = .0296 

loth • • • .0131 

nth • - • .0052 

I2th • - - .0019 

13th - - • .0006 

14th - • • .0002 

Terms 15 to 4001 together only occur about i in. loooo. 

7 ■ ' . 

Ghance of 3i 4, 5, or 6 occurrences = 'J- approx. 



302 



ELEMENTS OF STATISTICS. 



To take an actual example : — Out of some 530,000 deaths 
annually from all causes the following are the numbers from 
splenic fever in the years 1875- 1894 • — 

5, 4, 10, 14, 12, 18, 9, 15, 8, 18, II, II, II, 12, 7, 4, 3, 6, 7, 10. 
Average 10. 

Here/ !2_; ... ^«S2999 

S 30000 53000 

n is doubtful, and may be taken to be the total number 
of deaths or the total population ; but it will be found that 
the following numbers are unaffected, whichever number we 
adopt. 

. The successive terms in the expansion {p + gy'^^ are given in the second column. 



Chance of 





deaths is • 


I 




2 




3 




4 




5 




6 




7 




8 




9 




10 




II 




12 




13 




14 




IS 




16 




17 




18 




More than 18 



.000045 

.00045 
.00225 

.0075 

.0185 

.037 
.061 

.087 

.11 

.12 

.12 

.11 

.09 

.07 
.05 

.03 
.02 

.01 

.00 

o 



Number of occurrences 
to be expected in ao years. 

O 



-6 



Number 
observed. 

O 

O 

o 
I 

2> 

I 

I 

2 

I^ 
I 

2 

3' 

2> 
O 
I 
I 

o\ 

O 

2 
O' 



Considering the small number of years taken, and the in- 
definiteness of many of the death returns, the general consilience 
between the last two columns is satisfactory ; while the general 
principle that small numbers show a certain constancy is well 
exemplified. Specialists in all professions, from the doctor who 
treats only one obscure disease of the ear, to the dealer in 
curiosities, make their livelihood dependent on this permanence 
of small numbers. 

The regular occurrence of accidents and of improbable events 
in general furnishes other examples of the same sort. 

Note, — Since writing this section my attention has been called to a 
treatise by Dr Bortkewitsch, Das Gesets der Kleinen Ztf^/?«, Leipzig, 1898, 
where the close agreement of the records of accidents and other occasional 
events to the binomial expansion is dealt with in a more exhaustive and 
analytical manner. 



EXtEKSlON OF LAW OF ERROll AND At>PLlCATIONS. 303 



Section V. — Extension of the Law of Error and 

Applications. 

We have only shown so far that great numbers fluctuate 

about their mean in accordance with the law of error on the 

„ ^ assumption that for the existence or non-existence 

statement of of each particular unit there is the same numerical 

law of error, chance/. We can, however, prove by elementary 

methods that the same distribution is reached under many other 

circumstances, and at the same time make several important 

deductions. 

Suppose that a quantity whose mean value is H is deter- 
mined by the action of a great number of causes ; let the causes 
produce deviations e^, €3, . . . which are connected with 17, the 
corresponding deviation of H, by the equation ^ = «i €1+^2^2+ +> 
where a^y « 2 • • • ^^^ constants. If each of the deviations c^, ^2 . . . 
can be of various magnitudes, the curves which show the pro- 
babilities of the occurrence of these magnitudes are called 
"curves of frequency," or "facility curves." If the curves of fre- 
quency are normal curves of error, the chance of the occurrence 

of the deviation €j, cg . . . are proportional to e ^*y e'^^ ' ' ' 

where c^^ ^2 • • • ^^® ^^^ moduli of these curves. The following 
proof holds when these assumptions are justified ; but the result- 
ing theorems hold (i) when the e's belong to any curves of fre- 
quency such that their limits are narrow, while the number of 
c's is great, and the limits of each of the c's is small compared 
with Vy and none of the a's are predominant ; (2) where v is any 
function of («i€i+«2^2+ • • •)» such that i/ = aj€j++ is a first 
approximation. * 

The equation to the normal curve can also be deduced 
directly from other considerations, when we are dealing with any 
quantity liable to continuous small independent variations.f 

We will now show that when the assumptions are limited 

♦Adapted from a paper by Professor Edgeworth in the London and 
Edinburgh Philosophical Magazine^ vol. 34, 1892, p. 429. 

t See, for instance, Chauvenet's Astronomy^ vol. ii., Appendix ; and 
Merriman's Method of Least Squares^ chap, ii. 



304 ELEMENTS OF STATISTICS. 

as above that r) belongs to a normal curve of error with 
modulus J(iA^+alcl+ ). 

Case I. — If H is determined by two causes only, rf = a^€^ + ^g*?- 

Probability that the deviations Cj, Cg concur = C.tf ^^xe *=5 (where C is a 
constant) » (eliminating Cg) 

-/«? >->it|)n ____5!_ af cf 4-a| c| / _ a, c? iy \i 

C.e? *cf^ a|ci ) = C.tf (a? cf +«l c!) X ^ a? cf ci 'V*' afcf+a|c|/ 

This quantity is the probability that a deviation c^ occurs with a 
deviation rf. Giving c^ all its possible values in turn, we find that the 
probability of a deviation rj is 

r S^~ a§ cHaj c^ •/ _ a,cijL_V 

^ '(a?cf+afc|) X;^ -"" a|cfc5 V afci+aic|/ 



Cj = — CO 



Now the quantity included in the summation is the whole of a 
normal curve of error, which has been shifted through a horizontal 

distance — g 2 — 2~2* ^^^ ^^^ value depends only on the constants a^, 
^2* ^v ^2> ^^^ ^^ independent of rj and c^. 



'?• 



Hence probabiHty of a deviation rj = e *» ci^+^z-ct' ^ constant, and 
7; belongs to a normal curve of error with modulus sja^c-^ + a^c^. 

Case II. — If H is determined by three causes, f] = a^^^^ + a^^ + ^^€3, 
write 7/1 for a^Cj + a^^^ then ^ = ^1 + ^sfg. 

By theorem of Case I., modulus for ly^ is sja^c-^ + a^c^, 
and modulus for 17 is 



Similarly the theorem can be extended to any number of causes. 

Corollaries, 
I. Let X be the weighted average of the quantities x^^x^ . . . with 

weights ^1,^2. . ., so that x = -^-- ; 

suppose that the weights are known accurately, but that Xi — xi + €^, 
X2 = xi + €^y ... where :^x\. . . are correct values, and €^,€2. . . errors be- 
longing to normal : curves with moduli ^^j^g . . . Then if x^ be the 

correct value of x^ 

- ^a{x^ + €) -J Yae 

2a 2«. 



EXTENSION OF LAW OF ERROR AND APPLICATIONS. 305 
Hence the modulus of ^ is V(^?^ + ^^+-0 fo^ gach term in the above 

t^i + ^2 + ... 

theorem can be divided by the constant a^ + a^^ . . ,♦ 

2. Putting aj= ^2= «8 = » ^^^ ^1=^2=^8== ™^» ^^ ^'^^ ^s before 
that modulus of an unweighted average of n quantities, conforming 

n \fn 

3. If H is the difference between two quantities whose mean values 
are the same, and moduli c^ and c^^ ' 



to a curve with modulus r, is ^^ • ^ 



17, the deviation in H, = Cj - Cg ; modulus for 77= ^/^ + <^. 

4. If the two quantities are the averages of n^ and «2 quantities with 
moduli v^j v^y then by corollary 2, c^^ -^i^-, ^2 = -^, and modulus for 



n/«i n/« 



2 



difference between the quantities is by corollary 3, / -J. + _i. 

V «i «2 

5. In particular, the modulus for the difference between two quan- 
tities from one group, modulus r, is c ^2. 

Corollaries 3, 4, and 5 can be proved directly by the method 
of Case I. 

Precision of an Average. 

The second corollary, that the precision of an average is 
Prediionofan proportional to the square root of the number 

averaga of terms it contains, is so important that an 
independent proof may be offered, starting from different 
assumptions. 

Suppose a great number {m) of observations to be made 
of a single unknown quantity, e,g.^ the declination of a star. 
Let r be the "probable error" of a single observation, h the 
precision of the group, v the true, v-\'d^, v+d^ , , . v+d^ the 
observed values of the quantity. Let Xq be the arithmetic 
mean of the m observed values, and let Xo=v+8. Then 5 is 
the error of the arithmetic mean. 

Let 81, 83, . . 8„ be the differences between the observed values and 
Xq, so that v + d-^'^XQ+S^-^v-hB-hS^; d-^^B + 8^, ^2=^ + ^2» &c-> ^^^o 
mx^ = 2^ (^0 + ^1) = »m:o + ^8^; and 2^8^ - O. 



* This should be compared with pp. 204-214, supra. 

U 



306 ELEMENTS OF STATISTICS. 

Let Pi be the probability that this set of observations concur. Then 
p^^A.-O'dl xA*-"*-! X to « products -^*-'"<''i+''J+ > 

TT^ ir^ 

« -^x X tf , Since 2^ (oi) = O. 

?! is the probability that the observed values yield an error S in their 
arithmetic mean. Let P^ be the probability that the observed values 
yield no error in their arithmetic mean, then 

• _^ ^ -h«s™(«2) 
ir^ 

Pi = Po X tf-h'"*** 

Hence the arithmetic mean belongs to a curve of error whose pre- 

cision is ,Jf^m = h Jm, and therefore its probable error is — — . 

If the errors ^j, dTg . . . occur /i,/2 • • • times respectively in the observa- 
tions, while /i + /2 + + /m == «» the foregoing argument is unaffected, 
and the precision of the mean is h Juy that is h J(p^ •{- p^ + + /„). 

Care is needed to distinguish the hypotheses on which this 
formula, and the former formula -^— connecting weights, 

depend. 

A corresponding result may be obtained directly from the 
limit of the binomial expansion. If an experiment, for whose 
success the chance is /, is performed n times, the most probable 
number of successes is the nearest integer to pn, and the 
modulus for the various numbers is J2pgn, The modulus for 

the average of the n experiments is therefore '^^^^^ = ±130. ; that 

_ « ^/« 

IS, the precision is proportional to Jn, 

We can now obtain a formula for the modulus of a series 

of observations in a form often given. On p. 285 it is shown 

that if 81, 82 • • • ^n are divergencies from their average of a 

series of observations, and if these divergencies conform to a 

law of error with modulus c, then c should be taken as^ 



EXTENSION OF LAW OF ERROR AND APPLICATIONS. 307 

and the centre of the curve at the average, for maximum proba- 
bility. Now the average from which these divergencies are 

measured conforms to a curve modulus —7=, where c-. is the 

modulus of the divergencies measured from their true value^ not 
from their arithmetic mean ; 

then, if A is the divergence from the true value, modulus r^, 

5 is the divergence from the arithmetical mean, modulus ^, 
d is the divergence of the arithmetic mean from the true 

value, modulus — p, 

and ^1^ = ^ 4- -1-, from page 304. 

n 

Hence c-,^ = = . 

«- I n- 1 

Since n is large these quantities are very nearly equal ; and 
it is not worth while here to discuss their relative merits ; the 

latter quantity ^ — — is generally used.* 

As an example of this greater precision of averages, take 
the averages given on p. 289, each of 30 numbers, which 
range on a normal curve modulus 5 ; these averages are 25.9, 

25.7, 24.4, 24.4, 24.5, 24.9, 24.8, 25.7, 24*5, 25.7. General average 
25.05; differe nces, . 85, .65, —.65, —.65, —.55, —.15, —.25, .65, 

^'SSf 65; ^ = .89. The modulus for such groups is by 

theory — ^^ = .91 ... 

From the same page we may find the following 30 averages^ 
each of 10 numbers: — 23.6, 24.8, 24.4, 24.6, 25.3, 23.6, 25.7, 

25.8, 26.6, 26.0, 24.5, 25.0, 24.7, 26.9, 25.5, 23.8, 24.3, 24.3, 24.0, 
23.8, 24.4, 26.6, 24.4, 25.5, 25.6, 26.8, 24.0, 26.2, 25.6, 25.0. 

The probable error for these is by theory .47 of —4= =.642 ; 

^/IO 

and between the limits 2 5.04 ±.64 we actually find 15 out of the 
30 averages, while 6 are below the lower, and 7 above the higher 
limit 



* Vtde article by Prof. Edgeworth, Camd. Phil, Soc, Trans., 1885. 



308 ELEMENTS OF STATISTICS. 

The modulus for the whole 300 is \/ — * — ^ — ^ = .20 and 

the probable error .14 ; the average for an infinite number would 
be 25 ; for the 300 selected it is 25.043, that is well within the 
probable error. 

Examples of this kind could be multiplied indefinitely. 

Samples. 

The bearing of this principle on the method of sampling 
is very important. Our experience on most subjects is derived, 

not by examining all the existing examples, but 

Sampling. , •^. r 1 • , . a 

by noting a few which come in our way. A man 
of specialized experience is one who has seen and analyzed 
mentally many cognate phenomena. It needs no proof that the 
more samples taken, the more accurate will be the judgment 
formed about the group of which they are samples. Very many 
business transactions are decided by such an examination. Now 
we have seen that the precision of the average shown by samples 
of quantities which satisfy the normal law of error is inversely 
proportional to the square root of their number ; but there are 
three further questions to consider — (a) Whether this rule applies 
to samples of quantities which do not conform to the law of 
error, that is, which would not be obtained from a normal distri- 
bution without great improbability ; (/3) how we are to measure 
the precision of either the original group of which we have 
samples or of our samples ; (7) whether we can learn anything 
more about the original group besides its average. 

a. On referring back to page 303, it will be seen that the 
averages of samples of, say, m quantities drawn at random from 
a large group whose distribution is not normal, will, if m is 
large in relation to the fluctuation of the original group, satisfy 
the law of error. The reason, apart from the mathematical 
analysis of this, is clearer from the following illustration : if 
we have records of a quantity, which fluctuates in accordance 
with the normal law about an average which changes slowly 
year by year, our measurements will not conform to the normal 
law ; but if we select four years at random again and again, 
we shall eliminate the influence of time, and our samples will 
tend to conform. Readers-may experiment on the annual birth- 
rates to illustrate this. 



1 



EXTENSION OP LAW OF ERROR AND APPLICATIONS. 309 



The following numbers are the death-rates per 10,000 in 
London registration districts, arranged in order of magnitude . — 



70 


100 


"3 


120 


130 


141 ] 


[50 


160 


170 


181 


191 


204 


230 


252 


323 


70 


107 


"5 


121 


130 


141 ] 


150 


163 


177 


'P 


194 


205 


236 


252 


329 


80 


108 


115 


121 


131 


141 : 


[50 


164 


178 


183 


198 


210 


237 


255 


329 


92 


108 


"5 


123 


132 


142 ] 


151 


166 




185 


198 


211 


238 


264 


404 




109 


116 


123 


132 


144 ] 


151 


167 




188 




220 




264 


448 






117 


124 


132 


144 ] 


152 


167 








222 




266 


475 






118 


125 


133 


144 ] 


[52 


168 








222 




276 


505 






118 


126 
126 
127 
128 


135 
136 
137 
138 
138 
139 
139 


144 ] 

145 3 

145 3 

147 : 

148 ] 

149 ] 


153 

[58 
158 










223 
228 

• 




284 
286 


622 

625 

1,408 



These numbers clearly do not conform to the normal curve. 
We will omit 1,408 as being so far from the others as to be in 
a class by itself and select at random samples of 4, 18 times. 
Their averages are 174, 222, 226J, 221, 129, 150, 181J, 193, 300, 
133, 216, 178, 167, 169J, 183, 150, 227, 164. Average, 188; 
modulus, 57.4. These fit a curve of error closely, thus — 











Calculated from 








Observed. 


Table on p. a8z. 


Within 5 of 


average 


2 


1.7 




6i 




3 


2.3 




ID 




4 


3.5 




14 




5 


5.5 




I8i 




6 


6.3 


- 


21 




7 


7.1 




24 




8 


8.0 




28 




9 


9.2 




33 




ID 


10.5 




34 




II 


10.7 




38 




12 


".3 




38J 




14 


11.8 




39 




IS 


11.9 




55 




16 


14.8 




59 




17 


16.7 




112 




18 


18.0 



Thus the theorem is confirmed in a very unpromising case. 

/?. To determine the precision of the average of our samples, 
two methods are open. The first consists in finding the modulus 

^=a/--^ of all the quantities chosen; then if the quantities 

conform to a normal curve the modulus of their average is 

— P= ;^/-i ., and the precision is Jv?; if the quantities do 

t^n V n(n- 1) c 



310 



ELEMENTS OP STATISTICS. 



not conform this formula still gives the best measure of the preci- 
sion, but it may be well to confirm it by the second method. This 

method is to break up the n samples into - smaller groups each 

of m, and see if the averages of these groups are such as would 
come from a normal distribution ; if they do not, increase m ; if 
they show signs of normal grouping in a curve of modulus c, before 
we have come to the limiting value of w, then we may expect that 
the larger sample of n things belongs to a normal curve, whose 

modulus is —#=-, which may be expected to be equal to 



sjn 



If we do not get conformity with the largest value of nt we 
can take, we have no guarantee that n is large enough to 
eliminate the abnormality of the original figures. 

The following statistics of wages give a practical application 
of this principle. 

Nnmerioai I" the period 1834-45 inquiries were made in 

example. the Scotch villages as to the day wages of agri- 
cultural labourers. 

The resulting figures for the Lowlands may be tabulated 
as follows : — 



Numbers at 13d. 
5 



i3id. 
3 



I4d. 

2 



I5d. 
S 



i6d. 
12 



i6id. 
6 



I7d. 
24 



I7id. 
3 



i8d. 
39 



i8id. 
3 



Numbers at igd. 


2od. 


2id. 


22d. 


23d. 


23id. 


24d. 


24id. 


25d. 


27 


26 


27 


15 


I 


I 


4 


I 


2 



27d. 

2 



Average, i8.8d. ; modulus, 3.62d. = 6'. 



Correspondence with Law of Error. 









Observed 










Above 




Below 






Limits. 


Normal. 


Average. 




Average. 




Total 


18.8 ±i^ 


46 


27 


+ 


3 


= 


30 


\c 


90 


S3 


+ 


4S 


= 


98 


\c 


127 


S3 


+ 


69 


= 


122 


U 


156 


80 


+ 


87 


= 


167 


c 


178 


9S 


+ 


87 


= 


182 


%c 


192 


96 


+ 


9S 


= 


191 


ic 


201 


97 


+ 


97 


= 


194 


ic 


206 


102 


+ 


100 


= 


202 


2C 


210 


104 


+ 


loS 


= 


209 



EXTENSION 01^ LAW OF ERROR AND APPLICATIONS, ^ll 

When we divide the returns into 50 samples of 4 we get 
modulus for their averages 1.8 ; 25 samples of 8 give modulus 
1. 14; 40 samples of 5 give modulus 1.57; 20 samples of 10 
give modulus 1.19. 

The c for the original samples may be found from any of 
these ; the results are — 

Modulus of original samples - - 3.62 

„ „ calculated from the groups of 4, 1.8 x V4 =3.6 

fl» » n f) 8, I.I4xV£=3.2 

» » » » 10, i.i9xVio=3.8 

This is a close consilience with theory. We will adopt 3.6 
as the value of c^ then the modulus of the average of the 211 

original samples is .- — , its precision , and its probable 

V211 3.6 

error .47 of ^ — =^*I2 . . . , or i of a penny. 

We should verify that the samples conform to the law of 
error: the following shows the comparison for the samples 
of 4 : — 







Observed 










Above 




Below 






Limits. 


Normal. 


Average. 




Average. 




Total 


18.8 ± i of modulus (1.8) 


II 


6 


+ 


7 


= 


13 


* 


21 


7 


+ 


II 


= 


18 


i 


30 


14 


+ 


17 


= 


31 


* 


37 


14 


+ 


23 


= 


37 


modulus 


42 


18 


+ 


24 


= 


42 


f of modulus 


45 


20 


+ 


26 


= 


46 


* » 


48 


23 


+ 


26 


= 


49 


* 


49 


23 


+ 


26 


= 


49 


* 


49 


23 


+ 


27 


= 


SO 


2 modulus 


SO 


23 


+ 


27 


= 


SO 



This resemblance is as close as the argument requires. 

y. If our first samples conform to the law of error we know with 
reasonable certainty the average and the distribution of the original 
quantities — namely, that they conform to a normal curve with 
approximately the same average and modulus as our samples* 
The general average and the sample average differ in accordance 

with a law of error, modulus —7=, where c is modulus for samples 
and n their number. 



3 15 iELEMENTS OF STATISTICS. 

If our first samples do not conform, it is still probable that 
their curve of frequency has a resemblance to that of the original 
quantities. If the fraction /^ of the original quantities lay 
between assigned limits a^ and a^^ then the number to be 
expected between those limits in n samples is decided by the 
expansion of (/i+^i)", where /i+yi=*i; the most probable 
number is the integer nearest /i.«, and the modulus is J^p^^ ; 
similarly if p^ q^ bear a similar relation to ci^^ a^ the most 
probable number selected between these limits is p^n^ and 

modulus >j2p^^^ and so on. Thus a similar distribution may 
be expected, and each part of it has a precision varying jointly 
as the square root of the whole number taken and the quantity 
Jp{i^P)\ thus the larger the number taken the greater will 
be the resemblance, and [since Jpi^i-^Pi) > \/A(i— A) when 
Pi > Pt ^"^ A+A "^ ^] ^^^ larger the altitude of the area in the 
curve of frequency corresponding to given limits the greater its 
precision. The errors of the various divisions are not, however, 
entirely independent of one another. This is, of course, in 
strict accordance with the common sense of the question. 

The following examples of school ages illustrate part of 
this argument. In a school containing 257 boys of varying 

Nvmtrioai ages, where the dispersion was not likely to be 

•'•"^•- normal, 48 were selected at random and their 
ages written down. 

The modulus of the 48 samples is 43.2 months ; their average 
13 years 10 months ; their distribution as follows : — 







Observed 






• 




Above 




Below 








Normal 


Average. 




Average. 




Total 


Average + } modulus 


II 


3 


+ 


6 


= 


9 


1 » 


21 


10 


+ 


10 


s= 


20 


i » 


29 


12 


+ 


16 


= 


32 


♦ » 


36 


13 


+ 


21 


= 


34 


modulus 


40 


16 


+ 


22 


= 


38 


i „ 


44 


19 


+ 


24 


= 


43 


i .. 


46 


20 


+ 


26 


= 


46 


f .. 


47 


22 


+ 


26 


= 


48 



The observations are not grouped symmetrically nor in close 
agreement with the normal distribution. 

When we take the average of random samples we do not 
find close relation to the normal curve till the number of 
samples becomes too small to work with. To continue the 



EXTENSION OF LAW OF ERROR AND APPLICATIONS. 313 

process we assume that 48 is a large enough number of items to 
neutralize the want of symmetry in the figures. The average of the 
whole group is as likely as not to be within the limits 13 years 10 

months + J=X .47 months, that is between 13 years 7 months 

and 14 years i month. 

Again the quartiles in our samples are at 18 months above 
and 2 years below the average; the quartiles in the original 
group may be expected to be within the same distances with 
probable errors ^/2Xix| of 48 = 4.2 months, since the chance 
that any quantity shall be between the average and the lower 
quartile is J. 

From a census of the whole school it was found that all 
these conditions were fulfilled ; the average was 14 years ; the 
quartiles were unfortunately not kept ; but 58 boys out of the 
257 were stated to be over 15 years 9 months, from which it 
is highly probable that the upper quartile was within the given 
limits, 1 5 years 6 months ± 4 months ; and 54 were below 1 1 
years 10 months, which places the lower quartile also well 
within the limits. 

Corollary 4, relating to the modulus of a difference, is most 

useful in comparing two groups selected as having certain 

Fndsion of a qualities. Thus Professor Edgeworth* discusses 

differenoe. whether an ascertained difference of 2 inches 
between the average heights of a large number of criminals and 
that of the general population is significant ; and finding that 
the modulus for the difference between two random groups is 
only 0.08, holds that there is a cause of the difference in the 
method of selection ; that is, that criminality and low stature 
are found together. We might stpply the same principle to the 
investigation of the existence of a period in any figures ; for if 
the modulus of the figures was ^, the modulus for the difference 
between the averages of two random samples of 20 months each 

would be ^a/~+^=7P \ if the difference between the averages 

of the figures for 20 Decembers and 20 Junes was 3 times 
this quantity the existence of a period would be established. 
For instance, in the percentage of ironfounders unemployed 
monthly from 1855 to i874t the modulus for single months 

* Statistical Journal^ Jubilee Number, iHd, 
t See p. 179, supra. 



314 ELEMENTS OF STATISTICS. 

is about 30, and for the difference between the averages of 
two groups one of 20 and the other of 240 is therefore 

30. /— + — = 7 ; but the average of the 20 Decembers is about 

29 "/^ above the general average, a significant difference ; and 
the average of the 20 Augusts is about 19 7* below, a diver- 
gence smaller than before, but still significant ; the difference 
between the Decembers and Augusts, namely, 48, is to be 

compared with the modulus 30X a/4+^=9» and is therefore 

significdsit. 

A final example may be given which brings into relation 

many of these theorems. The following were the recorded 
oentrai times for "The Oaks" from 1850 to 1899; we 
axampia. ^srill discuss whether there has been a significant 

increase of speed, or some change in the conditions of the race, 

or whether the fluctuations are due to minor causes varying year 

by year. 

min. sec. min. sec. min. sec. min. sec. min. sec. 
1850 — 2 56 1860—2 56 1870—2 52 1880—2 49 1890—2 40^ 
185I — 2 52 1861— 2 44 187I— 2 51 1881— 2 46 189I— 2 54f 

1852—3 o 1862—2 49 1872—2 52 1882—2 49 1892—2 43I 

1853—2 52 1863—2 54 1873—2 50I 1883—2 53 1893—2 44^ 
1854—3 O 1864—2 47 1874 — 2 48^ 1884—2 49 1894—2 50 

1855—2 58 1865—2 51 1875-2 49^ 1885—2 43f 1895—2 48J 

1856—3 4 1866—2 53 1876—2 SO 1886—2 54f 1896—2 45f 

1857—2 so 1867—2 54 1877—2 S4i 1887—2 sof 1897—2 45 

1858—2 53^ 1868—2 47i 1878—2 54 1888-2 42^ 1898—2 45t 

1859—2 55 1869—2 59 1879—3 2 1889—2 45 1899—2 44 

Ten yearly 
averages 2 56 2 51^ 2 52} 2 48 2 47 

These figures fit fairly closely a normal curve of error with 
modulus 7.43 sees., average 2 min. 50.87 sees. The modulus for 

the difference between two is therefore 7.43 Vi + 1 = 10.48 sees 
The greatest difference between consecutive years is 14 sees., 
between 1856 and 1857 ; this is not sufficiently far beyond the 
modulus to make it uncommon ; hence there is no proof of any 
sudden change in arrangements having taken place between two 
races. The difference between the times for years early ir/ the period 
and those later is sometimes as much as 20 sees. The modulus 
for the difference between the averages for two periods of 10 years 

is 7-43 a/^+4~3'3' ^^^ difference between the averages for 



EXTENSION OF LAW OF ERROR AND APPLICATIONS. 315 

1850-59 and 1890-99 IS 9 sees., which is significant; that 
between 1850-59 and 1880-89 is also significant. The odds 
against such a difference as that between the average times of 
1850-59 and 1860-69 a^fi o'^Iy 13 to ^> "ot very significant. 
Hence we find that some cause was at work which gradually 
quickened the race between the fifties and the eighties. 

This method can be applied to the criticism of such serial 

figures as birth, death, and marriage rates, imports, exports, and 

* « « * so on. With a periodic series the method can be 

Application to ^ 

series of used first for establishing the period, and then for 
different classes, investigation of the figures found when the periodi- 
city is eliminated. With a symptomatic * curve, the method can 
be used for measuring the symptomatic tendency, and then for 
studying the short-period fluctuations. For a series which has 
no symptom and no period, the method is at once applicable 
for finding what divergencies are significant, and for forecasting 
and interpolating numbers. Without some machinery of cal- 
culation of this kind we are unable to get beyond vague and 
general impressions of the existence of a change; f but if we take 
care that the conditions of the calculation are satisfied, we can 
by the method now developed make a definite statement quite 
independent of personal bias, such as "either an event has 
happened, so improbable as to be outside the range of human 
experience, or the decrease shown in the series of figures in 
question is due to some significant change in the system of 
causes which produce them." 

* See p. 240, supra. 

t We can take an intermediate step by noticing in the above table that 
in nine cases out of ten the times in the decade 1880-9 ^^^ 1^^^ ^1^^" ^^ 
times thirty years earlier ; the chance that so great an agreement in the 
direction of the change (irrespective of its magnitude) should come in a 
random selection is tMt or .0215 ; the chance as calculated above is .0006. 
See Edgeworth va Jubilee Volume of Statistical Journal^ pp. 213-217, 



3l6 ELEMENTS OF STATISTICS. 



Section VI. — The Theory of Correlation. 

It is never easy to establish the existence of a causal connec- 
tion between two phenomena or series of phenomena; but a great 
oausai deal of light can often be thrown by the applica- 

(H^eottoii. tion of algebraic probability. We have already 
dealt with some cases in point ; we have shown how to find 
whether an event is due to a special cause, or whether it 
naturally arises from the variation of existing causes ; we have 
shown how to measure the significance of the difference between 
two quantities or two averages ; and further, we have investigated 
such problems as the influence of the seasons.* In many large 
groups of phenomena we can apply a more refined and more 
certain method, which it is our object to introduce in this 
section. When two quantities are so related that the fluctua- 
tions in one are in sympathy with the fluctuations of the other, 
so that an increase or decrease of one is found in connection 
with an increase or decrease (or inversely) of the other, and the 
greater the magnitude of the changes in the one, the greater 
the magnitude of the changes in the other, the quantities are 
said to be correlated. Correlation is a quantity which can be 
measured numerically; and its measurement has been the subject 
of much recent mathematical investigation. 

Let two variable quantities X, Y be subject to variations x^y^ 
TiieooireiattoB which are due to a multitude of individually unim- 

Bnrfaoo. portant causes, producing fluctuations e^^e^, . .\ Cg 
... so that the x's are connected with the e's and the ys with 
the c's by the equations, 

j^ = ^j €j + ^2 Cg + ^r bj^ €n, where a^, ag . . . ^x> ^2 - * * ^^® constants. 

Then x and y conform to normal curves of error, whose 
moduli we will call r^, c^. 

The rest of our investigation^ which is based closely on 
Professor Karl Pearson's paper on " Regression, Heredity, and 
Panmixia," f proceeds on the assumption that the ^'s and c's 

* See p. 186, supra, 

t Transactions of the Royal Society y vol. 187 (1896), A. 253-318. 






THE THEORY OF CORRELATION. 317 

conform to normal curves of error, which is not, however, the 
most general condition. 

Let individual values of X and Y, x and y respectively, be 
grouped in pairs, as measurements of two quantities at the same 
date, or of two parts of the same organism, or in any other way. If 
xzxiAy are quite independent, none of the causes producing them 
are common to both, and the e'^ are independent of the c's in the 
above equations. Then z^ the chance of divergencies x and y 

concurring = e ^' xe *^^ X (a constant). 

For any one value of x, the quantities y are grouped about 
the mean value Y, in accordance with the normal curve c^ ; and 
similarly for any one value ofy. 

The above equation may be written z = C.e ^^ ^^ , If we 
give z any definite value ky the x's andys which have jointly the 
probability ky are connected by the equation 

which is the equation of an ellipse having its principal axes 
coincident with the axes of measurement of x and y^ if we 
suppose X and y measured on two horizontal lines perpendicular 
to one another. Let z be measured vertically ; then in the 
surface given by the equation connecting Zy Xy y all the hori- 
zontal sections are similar ellipses, whose projections on a 
horizontal plane are concentric and similarly situated,* while 
all the vertical sections are normal curves of error.f 

This is the surface of no correlation. 

If, on the other hand, any of the e'% coincide with any of the 
€'s, it may be shown that a new term is introduced in the 
equation between Zy Xy and x which becomes 

_ « I Vci» c,c,"^c| 7i-r« 

a — ■ • - .g 

♦ For a diagram of this projection and for a general discussion of corre- 
lation on the same lines as this chapter, but more advanced and complete, 
see Mr Udny Yule^s paper on T/ie Theory of Correlation in the Statistical 
Joumaly Dec. 1897. 

/ x* mx+n *\ 

t For the section by a plane y = tnx + « is ar = C^ "^ «i* ^a* K which 
may be written z = e"^^^^ ^ x D where A, B, and D are constants. 



3l8 ELEMENTS OF STATISTICS. 

where n is the number of pairs of observations and r is a 
. quantity we have still to decide ; this is the general equation 
of the normal correlation surface. The horizontal sections, 
obtained by giving s different constant values, are now of the 
form 

-J - 2r.-:^ 4- ri^ ga /, a quantity independent of x and y. 

The projections of the horizontal sections are still concentric, 
similar and similarly situated ellipses, but their principal axes 
are now inclined to the axes of x and^. The vertical sections 
are still normal curves of error with various centres ; in particular 
the frequencies of the values oi y found in conjunction with x^, a 
particular value of Xy are given by the equation 

= <? Vcf c,c, c|/x-r« X constant 
^ ^ c8(x-r»)\ c, / ^ constant. 

This is a normal curve of error with its centre at r.-^x,. 

^1 
Thus the mean value of y corresponding to x^, a given value of\y is 



r. -i?Xi 
^1 



• V X 

These mean values all lie on the line ^ = r.- . 



Cg C| 



Similarly the mean values of x^ corresponding to given values of j^, 
lie on the line - = r.^ . 



Cj Cg 



r is called the coefficient of correlation. If r is positive, for 

every given value of ;r, the mean value of the corresponding ys is 

The ooeffloieBt positive and a definite fraction of ;r ; if r is negative, 

of oorreiatioB. ^^g correlation is said to be negative, and for every 

given value of ;r, the mean value of the corresponding ys is a 

definite negative fraction of ;r. 

To determine the value of r, we must observe that this single 
quantity determines the shape of the whole surface, when N, q, c^ 
are given, just as the modulus determines the shape of the curve 
of error. We decided the best value of the modulus* by con- 
sidering from what curve of error the observed values would 
arise with least improbability. Professor Pearson finds the value 

* See p. 283, supra^ 



THE THEORY OF CORRELATION. 319 

of r by considering from what distribution of z, x, and y (t,e,, 
from what surface of correlation) the observed pairs would 

arise with least improbability ; r is thus found to be -i^ or 

(— .— j,the summation being extended over all pairs of 
;r, ^, where o-^, <t^ are the errors of mean square of the x's and 



r, c 



(To = 



— "2 



y's respectively, and hence 0-^= — ^^ ^^ — ,. 

But with other values of r the observed pairs might have 
been obtained with greater or less improbability, and these 
values are distributed in accordance with a normal curve of 

error whose probable error is .67 — =- ; * that is, when from all 

the possible correlation surfaces, which might have resulted in the 
observed values, those whose correlation coefficients are within 

the limits r ± ,6y — i^ are selected, the sum of their pro- 

babilities is J. 

It will be useful to examine the limits of the possible values 
of n 

r always lies between + 1 and - i. 



For n^<T\a\- (Ixyf = 2»:2.2y - (Sjcy)", since <t^ = ,J^, 'r^ = V^ 

= {KA + ^IvJ + +) {A +^5 + +)-(V! + VI + +)*. 

( where {x i^i)(*2 J2) • • • £ire pairs of observations, and -i = Xi, — ? = Xj . . . ) 

which is zero if X^ = Ag = Xg = = , but otherwise positive. 
Hence n^(r\a\ - (^yf is positive, 

. > i^y 

I >f^ 
and r is between + 1 and - i, except when A^ = Ag = Ag « ; in 

this case r = + 1, and the correlation is said to be perfect, positively 
or negatively. 

* See Pearson, /oc. cii.^ p. 226 ; Yule, ioc, city p. 847 ; and Pearson, 
Proceedings of Royal Society^ Oct. 1897. By a similar line of reasoning the 

probable error of c as determined on p. 283 is found to be .477—7=. 



320 ELEMENTS OF STATISTICS. 

It may be noticed that on d priori grounds without any 
mathematical investigation the formula -(^.•?^ + ^.-^+ +) 

gives a good measure of correlation. For if there is positive 
correlation, whenever we have a positive value of x we may 
expect a positive value of ^, and whenever we have a negative 
value of X we may expect a negative value of y^ and each such 
term increases the coefficient ; while, if there is no correlation, 
for any value of x occurring several times, we may expect 
positive and negative values of y which on the whole give a 
very small sum. Meanwhile the denominators at once bring 
the deviations into relation with the mean deviations, and pre- 
vent the whole coefficient becoming greater than unity. 

We see then that r measures the correspondence between 

deviations from their means of the two series of observations. 

The measurement If the deviations are in exactly the same ratio for 

of oorreiatton. ^\\ pairs, the correlation is perfect, and r= i ; while 

r tends to zero when for a given deviation in one of the series 

we have excess and defect with equal frequency in the other. 

r serves as a measure of any statement involving two quali- 
fying adjectives, which can be measured numerically, such as 
"tall men have tall sons," "wet. springs bring dry summers," 
" short hours go with high wages." 

, J __ ♦•2 

When r is not greater than its probable error £7 — =- we 

have no evidence that there is any correlation, for the observed 
phenomena might easily arise from totally unconnected causes ; 
but when r is greater than, say, 6 times its probable error, we 
may be practically certain that the phenomena are not indepen- 
dent of each other, for the chance that the observed results would 
be obtained from unconnected causes is practically zero. 

The calculation of r is quite simple, and if we can assume 
normal dispersion, so that the probable error in a series is 
equal to .67 of the error of mean square,* can be performed 
very rapidly. In the following tables the correlation between 
the prices of wheat, foreign trade, and the marriage-rate, already 
discussed by the help of the graphic method, is investigated. 

* Hence o* = about f of distance between quartiles. 



THE THEORY OF CORRELATION. 



321 



Examples of Correlation. 







Marriage-Rate and Price of 


Wheat 




1845-64. 
























Years. H'it^^' 1 


Differences. 


Price of 
Wheat. 


Differences- 


ProducU 1 
of 


Imports 
and ExporLs 




fvavc« 






s. 


d. 








Differences. 


j^ mln. 


1845 


17.2 .'^ 


+ 


^^^^^< 


50 


10 


\0 


— 


20 


— 


8 


143 


1846 


17.2 - ^ 


+ 


•44 


54 


8 


^ 


+ 


26 


+ 


II 


131 


1847 


15.8.-^ 


- 


.96 


69 


9 


% 


+ 207 


— 


199 


142 


1848 


15.9 -'^ 


- 


.86 


50 


6 


1 


— 


24 


+ 


21 


142 


1849 


16.2 . ^ 


- 


.56 


44 


3 




— 


99 


+ 


56 


162 


1850 


17.2 ^ *' 


+ 


.44 


40 


3 


5 


— 


147 


— 


65 


172 


185I 


17.2 - *^ 


+ 


.44 


38 


6 


M 


— 


168 


— 


74 


185 


1852 


17.4 -^ 


+ 


.64 


40 


9 


C^ 


— 


145 


- 


92 


187 


1853 


17.9 - ^ 


+ 


1. 14 


53 


3 


V 


+ 


9 


+ 


10 


222 


1854 


17.2 -' 


+ 


.44 


72 


5 


1 


+ 


239 


+ 


105 


249 


1855 


16.2-^ 


— 


.56 


74 


8 


6 


+ 


262 


— 


147 


239 


1856 


i6,7f / 


— 


.06 


69 


2 


1 


+ 


2CX> 


— 


12 


288 


1857 


16.5 ^ ^ 


- 


.26 


56 


4 


1 


+ 


46 


— 


12 


310 


1858 


16.0 ^ 


— 


.76 


44 


2 


3 


— 


114 


+ 


?>! 


281 


1859 


17.0 ^. 


+ 


.24 


43 


9 


• 


— 


119 


— 


29 


309 


i860 


17. 1 ^ 


+ 


.34 


53 


3 


*) 


+ 


9 


+ 


3 


346 


1861 


16.3 1/ 


— 


.46 


55 


4 


l« 


+ 


34 


— 


16 


342 


1862 


16. 1 -7 


— 


.66 


55 


5 


>; 


+ 


35 


— 


23 


340 


1863 


16.8 i 


+ 


.64 


44 


9 


r 


— 


93 


— 


4 


396 


1864 


17.2 ^ 


+ 


.44 


40 


2 


1 


— 


148 


— 


63 


435 


Av 


. 16.76 




Av 


• 52 


6 






Sry 


= - 


461 


■ %^ m^ 



Correlation between marriage-rate and 
the price of wheat — 

o-^ = .580 . o-j = 133 

r = — ^ ^ = - .30 

20 X 133 X. 58 

Probable error of r = - . 14 



Correlation between marriage-rate and 
imports and exports — 

CTi = .580 (Tg = 90 

2;ry = 8 
r = -t- .007 

Probable error of r = .15 



* For a complete examination of the particular question here used for illustration, 
see Mr Hooker's paper in the Statistical Journal^ September 1891. 



32i 



teLEMENtS OF StAtlSTlCS. 



1875-94- 

1875 
1876 

1877 
1878 

1879 
1880 
1881 
1882 
1883 
1884 
1885 
1886 
1887 
1888 
1889 
1S90 

1891 
1892 

1893 
1894 



X6.7 
16.5 

15.7 
15.2 

14.4 
14.9 

15.1 
15-5 

15. 1 
14.5 
14.2 
14.4 
14.4 

15-0 
15.5 
15-6 

lS-4 
14.7 

15.1 
Av. 15.17 



+ 
+ 
+ 



+ 
+ 



+ 
+ 



1.53 
1.33 
.53 
.03 
'77 
.27 
.07 
.33 
•33 
.17 
.67 
.97 
.77 
.77 
.17 
.33 
.43 
.23 
.47 
.07 



45 
46 

56 

46 

43 
44 
45 
45 

41 
35 



2 
2 

9 

5 
10 

4 

4 
I 

7 
8 



+ 88 
+ 100 
+ 227 
+ 103 



+ 
+ 
+ 
+ 



72 
78 
90 
87 



32 10 
31 o 



32 

31 

29 

31 

37 

30 
26 

22 

Av. 37 



6 
10 

9 
II 

o 

3 

4 

10 

10 



+ 45 

- 26 

- 60 
^ 82 

- 64 

- 72 

- 97 

- 72 

- 10 

- 91 

- 138 

- 180 



+ 13s 

+ 133 
+ 120 

+ 3 

- 55 

- 21 

- 6 
29 

15 

4 
40 

80 
49 

55 
16 

23 

4 
21 

65 
13 



+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 



+ 
+ 



Sry = 627 



597 

576 

593 
562 

554 
634 

631 
654 
667 
623 
584 

563 
584 
622 
676 
684 
683 
651 
623 
624 



Correlation between marriage-rate and 
the price of wheat — 

o"! = .651 0-2 = 102 
2jry = 627 

[Or distance between quartiles = .9, 
whence o-, = .67] 

627 



r = 



20 X 102 X. 651 
Probable error . i 



= +.47 



Correlation between marriage-rate and 
imports and exports — 

CTj = .651 (Tg = 41 



^-+.25 
Probable error of r 



,14 



Hence there was slight negative correlation between the 
marriage -rate and price of wheat before 1864, that is, the 
marriage-rate fell when wheat rose; but since 1864 there is 
better evidence that the marriage-rate rises when wheat rises. 
The marriage-rate and foreign trade were quite uncorrected 
before 1864, and show only slight correlation at more recent 
dates ; the odds against the correspondence between the ob- 
served figures, since 1875, arising without causal connection are 
only about 4 to i, if we assume that the figures for each year 
are independent of the next. 

An earlier method of estimating correlation, introduced by 
The Gaitonic Mr Galton,* is very useful for a rapid survey of 

method. ^.^Q groups of figures. As a simple example 
adequately illustrating the method, we will take two series 

♦See Proceedings of the Royal Society ^ 1886, vol. xi., Family Likeness 
in Stature. 



CORRELATION OF DAILY MAXIMA AND MINIMA OF 

TEMPERATURE IN 1898. 















M 


1 N 1 


1 M A 














Notbfiloii 


23* 


27- 


31- 


35- 


3,r 


43- 


47- 


SI- 


SS- 


S3* 


63- 


f4H«ne 
ivithin 
Ihnlts 


MmImm 

or 
coire- 

mIntM 




butbciow 


27* 


31- 


3S- 


39- 


43* 


4r 


51* 


55* 


59- 


63* 




35* 

ftbalow 
40* 


3 


2 


9 


















14 


32- 


< 


40" 

« below 

45* 


2 


7 


12 


7 


6 














34 


33- 


z 

X 


4S* 

« bolow 
30* 


1 


4 


17 


13 


II 


5 












51 


36- 


< 


SO* 
SS' 




4 


5 


12 


12 


II 


8 










SZ 


40i* 




SB* 

A below 

60* 




I 


1 


3 


15 


16 


15 


2 


1 






54 


44*- 




60* 

* below 

65* 






1 


3 


3 


16 


15 


10 


2 






SO 


47- 




65* 

«below 
70* 








1 


7 


3 


9 


5 


3 






za 


46- 


< 


70" 

Abekm 
73* 












2 


7 


10 


5 


2 




26 


S3' 


X 

< 


75* 

Abslow 

eo* 












3 


5 


7 


9 


2 




26 


54- 


X 


60' 

» below 

ar 
















1 


10 


5 




16 


se- 




S5* 




















4 


3 


7 


er 




NMbartf 

faJtint 


6 


16 


45 


39 


54 


56 


59 


35 


30 


13 


3 


TOTAL 

356 






MednMrf 
. niuiiii& 


40* 


45* 


45* 


50- 


54- 


59- 


62- 


TO- 


TT 


02- 


87' 







Median 



MINIMA 
MAXIMA 



57f 



Quartiles 
37* 51* 
49* 67* 



Distance 

between 

Quartiles 

14* 
18 • 



To face page 2^^ 



THE THEORY OF CORRELATION. 323 

where the correlation is likely to be great, namely, the daily 
records of maxima and minima temperature recorded for 1898* 

We first make a rapid survey of the series, and notice that 
the minima range from about as" to 63°, and the maxima 
from about 35° to 95°. Divide each of these ranges into, say, 
ID equal parts, and draw up the framework of a table like 
the annexed. Turn through the records and enter the maxi- 
mum and minimum for each day by a dot in the appropriate 
place; thus on 4th October the maximum was 61,3° and the 
minimum 51.3°; a dot should be put in the row 60° -64.9° under 
the heading Si°-S4-9°- When all the dots are entered, replace 
them by tiieir number in each square. The table shows the 
result for 358 days. If there is correlation, it will be found 
that the medians, or arithmetic averages, of each row form an 
orderly progression, and similarly for each column. These 
medians are roughly estimated and given in the table. 

To test the correlation of the minima relative to the maxima 
a diagram is drawn. Choose scales so that the distance between 
the quartiles of the maxima (18°) shall be represented by the 
same length vertically, as represents the distance between the 
quartiles of the minima (14°) horizontally. Place crosses hori- 
zontally level with the middle points of the successive limits 



i7o 
1 56° 



» Whi/a&er's Almanack, I 



324 ELEMENTS OP StATlSTlCS. 

of maxima and vertically above the positions on the scale of the 
medians of the corresponding minima. 

Now draw two lines. The first through the positions of the 
quartiles and median (Qp Qg, M) ; this is the line of perfect corre- 
lation, and with the scales we have chosen is at 45° to the hori- 
zontal ; draw another line through M,* passing as near as possible 
to all the crosses. Draw any horizontal line PCN intersecting 
the former lines as in the figure. The ratio of CN to PN is the 
coefficient of correlation. If the line CM passes through all the 
crosses and coincides with PM, the correlation is perfect If 
CM is perpendicular to PM, there is perfect negative correlation. 
If CM is vertical there is no correlation. In the figure the ratio 
CN to PN is |. A rough test of the presence of correlation is to 
be obtained by noticing whether all the crosses above the median 
are on one side of NM and below the median on the other side. 

There is a simple connection between the coefficient thus determined 
and that obtained by the previous formula. On the diagram the scales are 

Relation between so chosen that we replace — > "^ by quantities ^, t\ measured 
themetliods. ^* ^f 

by equal units. Then if (^^ lyj) (^j, i/a) • . . are the positions 

of the 358 original pairs the linej^=r;r can be shown to be that whose meant 
distance fro. these points is a a.inin,um when rA its value previously 

CN 
given. It is easily seen that in the figure the ratio pj^ is n, \iy=^r\ ,x is the 

equation of a line through M referred to horizontal and vertical axes. Hence 

CN ^^t) 
the line CM might be drawn from the original formula by taking p^^ = __' ; 

in other words, we have here a graphic method of finding the coefficient of 
correlation. 

Calculating r roughly from the data 0-1=12.7, 0-2=9.1, «=358, l.xy= 

32130, r= ^^'^^ =.8 approx. ; that is, we obtain approximately the 

same value by either method. 

Mr Galton applied this method to the question of inheritance 
of stature. He found that the coefficient of correlation between 

the statures of children and of their (mid-) parents 

egres ^^^ ^^ That is if a group of (mid-) parents had an 

average stature x inches above (or below) the general average, 
the average for their sons was only § x inches above (or below) 

♦ Perhaps the best way to do this is to rotate a line about M till the number 
of crosses on its two sides are equal. 

t For the sense in which this word is to be taken, see Yule, loc. ctt^ pp. 
817-8. 



I 

F 

i 



THE THEORY OF CORRELATION. 32$ 

the general average. This return towards the average is called 
in biological language "regression " ; such equations as^=r.^;ir, 
;r=r.-^.j' are the "equations of regression"; and the quantities 



^2 



r.— ^ and n— 1 are the " coefficients of regression," the one when 



^1 <^2 



the sons of a group of parents of given height are compared with 
their parents, the other when the parents of a group of parents of 
given height are compared with their sons, where o-j, <t^ are the 
standard deviations in height for the parents and sons respectively. 

There is an intimate relation between the law of error and 
biological theory. The law of error and other cognate laws 

give algebraic expression to the universal ten- 
dency to variation, whether we are dealing with 
any part of the social organism to whose measurement we have 
in this book limited statistics, or with any measurable organ of 
an animal or vegetable. The law of heredity can be only tested 
numerically by the theory of correlation ; the effect of natural 
selection is easily considered with the help of the coefficient of 
regression. For if there is no selection, the distance from the 
general average of the mean stature of successive generations, 
descended from a group whose mean deviation was x, will be 
rx, t^x . . . f^x if r remains unchanged, a series whose terms 
rapidly tend to zero. If on the other hand a selection is made 
in each generation of those above the average, the divergence 
can be preserved and intensified. The discussion of this point 
would lead us too far afield. 

In this Second Part we have only discussed the elements of 
the subject, the theorems and formulae which writers on statistics 

now assume. We have examined only the normal 

GonclnsloxL ^ . , . « 

curve of error, postponing the asymmetrical curve 
of error to the appendix, and have not touched algebraic formulae 
arising from different hypotheses, or correlation between more than 
two variables. In the region to which we have confined ourselves, 
however, we have had to deal with arguments of the same 
nature as are to be met with in the higher paths of statistics. 
The great difficulty which the student of economics encounters 
when dealing with the theory of error is the apparent slightness 
of relation between this theory and the facts with which he 
deals. This slightness is only apparent ; it is because the 
theory has not, in the form he meets it, been carried far enough 



326 ELEMENTS OF STATISTICS. 

to fit it to the very complex facts of human affairs that we do 
not get that exact correspondence we might desire. The 
theoretical distribution of error may be expected to underlie 
all phenomena, just as the attraction of gravity underlies the 
action of all machinery. We cannot explain the motion of 
machinery by gravity alone, we need to consider also other 
natural forces, not so easily measured as gravity ; but still less 
can we explain that action if we ignore the force of gravity. 
It is hoped that the short treatment here given of the 
elements of so important a subject may make smoother the 
approach to a field of investigation where there is great promise 
of harvest but where the reapers are as yet few. 

Note, — While this book has been in the Press, an article by Prof. Pearson 
has appeared in the Philosophical Magazine y July 1900, violently criticising 
the method adopted by- most of his predecessors, who have investigated the 
applicability of the Law of Error to Statistics, that is to say, the method of 
first deducing the equation from cL priori considerations, and then comparing 
the results with experiments. By means of a criterion of ** fitting,*' which 
should be carefully studied, he shows that the chances that the statistics, 
with which Airy and Merriman illustrate the theory, would have arisen 
from random sampling are only .01423 and .000,00155 respectively on their 
hypothesis, and deduces " that the normal curve possesses no special fitness 
for describing errors or deviations such as arise either in observing practice 
or in nature." It is to be remarked on this, first that the investigation of 
two examples does not prove his case, secondly that his criticism does not 
apply to such curves as the asymmetrical curve of error treated exhaustively 
by Professor Edgeworth, and thirdly that the claim of the authors, whom 
he treats with such contempt, is not that the fit is exact, but "that the 
formula represents with all practicable accuracy the observed frequency" 
(Airy, quoted by Pearson) or "that the agreement is very satisfactory^^ 
(Merriman) : thus the authors in question make no claim that the normal 
law is the complete explanation of the observed errors, but are satisfied with 
the approximation they found : it was not to be expected that the pioneers in 
the field should attain finality. By a similar process the law of gravitation 
might be treated with derision by criticising the experiments of an Attwood's 
machine, when the resistance of the air was not considered. Prof. Pearson 
has four constants in the curve by which he attains a close fit in his Illustra- 
tion IV., and by increasing the number of his constants might obtain an 
absolute fit. With those developments of the normal curve of error, which 
depend on hypotheses very similar to those used by the earlier writers (see 
p. 303, supray and Professor Edgeworth's recent contributions to the Statis- 
tical Joumal\ more constants are present, and there is every likelihood that 
equally close agreement may be found. The present author does not, however, 
wish to enter here into the controversy as to which is the best formula for 
classifying phenomena. His intention has been to follow in the beaten track, 
and there can be little doubt that the ordinary reader will prefer to find some 



THE THEORY OF CORRELATION. 327 

^/wrz* justification for the unfamiliar theory that natural phenomena can be 
represented by the formulae of algebraic probability, /«r^ the author of The 
Grammar of Science^ though he may recognise that the ultimate justification 
for the theory must be experience. There is no suggestion in this book that 
the whole of nature can be measured by the foot-rule of the normal curve of 
error ; but yet that it may be a useful instrument has been shown by few 
people more conclusively than by Prof. Pearson himself. 

In the following list will be found those books.and articles relating to the subject 
of Part II. of this book which are most accessible and likely to be most useful to the 
English student. Further references to foreign authors and to earlier writers will of 
course be found in the works here mentioned : — 

TODHUNTER, \,-~Hisiory of the Theory of Probability, Especially Arts. 993-iooi. 
ENCYCLOPiEDiA Britannica. — Article on Probability. 

Dictionary of Political Economy (Palgrave's). — Article on Law of Error. 
Galton, ¥,— ^Inquiries into Human Faculty and its Development, 

Natural Inheritance, 

Family Likeness in Stature, Proc. of Royal Soc, 1886, 1888. 

Mbrriman, M. — Method of Least Squares. 
Chauvenet, — Practical and Spherical Astronomy ^ vol. ii. , App. 
Edgeworth, Prof. F. Y. — In the London, Edinburgh and Dublin Philosophical 
Magazine and Journal of Science (formerly issued under other similar titles, 
and known as the Literary and Philosophical Magazine), Sth series. 

Vols. 21, 22, 23, 24, 25, 30. Various investigations and examples. 

Vols. 34, 35, 36. Correlation. 

Vol. 41. Asymmetrical law of error. 

In the Journal of the Royal Statistical Society ^ 1886 and Jubilee Volume 

Methods of Statistics. 

1888 and 1890. Chance in competitive examinations. 

1893 and 1894. Correlation. 

1895. Recent contributions (Pearson's) to theory. 

1896, 1897, and 1898. Miscellaneous applications of the Calculus of Probabilities. 
1899 and 1900. Representation of Statistics by Mathematical Formulae. 

1902. Methods of Representing Statistics not Fulfilling the Normal Law, 
1906. The Generalised Law of Error, 

Report of Committee of British Association on Monetary Standard, 1888. 

Camb. Phil. Soc. Trans., 1885 ^^^ 1886. Merits of various means. 



[The exact titles of the above articles may be found from the indexes of the 

volumes mentioned.] 
Pearson, Prof. K. — The Chance of Death and other Essays, 

77ie Gramtnar of Science, chaps, x., xi. 

Contributions to Mathematical Theory of Evolution in Transactions of Royal 

Society, 1894, 1895, 1896, 1898, and Stat. Soc. Journal, 1896, 1897. 

Probable errors of frequency constants. Royal Soc. Trans., 1898. 

Criterion , . . of Deviations , » , in a Correlated System . , , Phil. Mag., 

July 1900. 
General Theory of Skew Correlation. Drapers' Company Research Memoir. 



328 ELEMENTS OF STATISTICS. 

Elderton, W. p. — Frequency ' Curves and Correlation^ 1906. [An account of 

Prof. K. Pearson's methods, with applications.] 
BiOM ETR I K A. — Passim. 
Venn, Dr J.— 754^ Logic of Chance. 

— Nature and Use of Azferages, Stat. Journal, 1891. 

— Cambridlge Anthropometry. Journal of Anthropological Institute, Nov. 1888. 
Yule, U. — History of Pauperism. — Stat. Journal, 1896. 

Theory of Correlation. Do. 1897. 

Changes in Pauperism, Do. 1899. 

Association of Attributes in Statistics, Royal Soc. Trans., 190a 

The Decline of Human Fertility, Do. 1906. 

BOWLEY, A. L. — Accuracy of an Average, Stat. Journal, 1897. 
Address to Section F. Do. 1906. 

Sheppard, W. F. — On the calculation of the Average Square. Stat. Journal, 1897. 

Use of Auxiliary Curves, Stat. Journal, 1900. 

Normal Correlation. Camb. Phil. Society, vol. xix. 

Normal Distribution and Correlation. Royal Soc. Trans., 1898. 

Fechner, G. T. — JCollectivmasslehre, 1897. 

See also, List of Works on the Theory of Probability. Stat. Journal, 1906, p. 755. 



APPENDIX. 

On the Measurement of an Unsymmetrical Group. 

In the chapter on Averages, and in Section II. of Part II., we 
have developed a system of measurement which applies in its 
entirety only to a symmetrical group. It is true that the state- 
ment of the two quartiles and the median, or of any group of 
percentiles more than three ip number, allows us to describe a 
group whether symmetrical or not ; but the use of the "dispersion," 
p. 136, or of one of the quantities r, c, ^, or € of Part IL, gives us 
a measurement of the relation of a group to its average, which is 
incomplete till we either know that the group is symmetrical or 
have a measure of its skewness. 

The relative position of the arithmetic average and the 
median, or of the arithmetic average and the mode, or of the two 
quartiles and the median, clearly have some relation to the 
skewness. 

The curve of error throws considerable light on these rela- 
tions, and affords us several means of calculating a measure of 
skewness generally applicable to the description of groups. 

If in the equation on p. 276, we put x=:sx is]2pqn^ thus making 
z finite, and then include terms of the order «"*, we obtain 
that the chance of drawing /;^-|-;r white balls 

= P^- [i-2j (^-1^)], where j = j^^- 

Replacing z by -, where ^ is (as before) 2pqn, we have for 
the equation of the curve — 

N ■-^' r . fx 2 0^^ 



N — , r . /^ 2 ^\ "I 



where the unit of ordinates is so taken that the area of the whole 
curve is N, the number of observations.* 



* An equivalent formula is given by Westergaard {Die Grundzuge det 
Theorie der Statisfik, p. 70) from Laplace ; see Todhunter's History of the 
Theory of Probability^ Art. 993. 



330 APPENDIX. 

j is thus equal to ^-=^=^^=M 

■' ^ 2C C C 

The abscissa of the centre of gravity is still zero, and the 
error of mean square —7=. 

j has been so chosen that it is equal to the mean cube of 
error, divided by (? to make it independent of the unit* 

Writing the above equation in its integral form, and putting 
Yr for the area standing on the line CX, where C is the origin, and 

X a point on the axis with abscissa ;r, and t = -, we have 

= N{F(T)+j./(T)}. 

The double sign is due to the fact that j c must be taken as 



^ — ^ on the/ side of the centre. 

F (t) is tabulated on p. 281,/ (t) on p. 332. 

A tabulation of Y^ for certain values of j will be found in the 
Statistical Journal for July I902.f 

j should be calculated by a method similar to that used for cal- 
culating c on p. 290 ; the cubes of errors being taken instead of 
the square, and the result divided by ^. An example is given 
below. 

Professor Edgeworth has shown that, with j so calculated, 
the equation above given is a correct expression for the 
curve of error resulting from the hypotheses stated on p. 303. 
The normal curve of error, discussed in sections I. to V., is the 
first approximation, when j is taken as zero. If j is not small, 
this second approximation breaks down not far from the centre ; 
but if j is not greater than .3, a great part of the curve is 
correctly given. 

The curve, whose equation contains j, intersects the curve 
obtained by putting j =0, when ;r = or ±c, J\, This is seen on 
the accompanying diagram. 

^ ine mean cuoe or error = '^^ which can be verified as equal toj. 

t There /(t) does not include r-7-, and its sign is reversed*^ 

3 vtt 



APPENDIX. 331 

The median, O, can be shown to be approximately at the 
point T= -Jj; the mode, M, at the point t= -j; the quartiles, 

Qi> Q2> ^^ '*■= ±P+« (2 /)2-i)= ±.477-.i82J, where p has the 

same meaning as on p. 292, and is written for .4769. 

Thus j may be calculated from observing any two of the 
above positions (when c is known, and the observations reduced 
to unit c). 

If we do not know ^, the positions of these points* are given by 

CM=-jrt 

OQ, = pc-§p^ic, 

where C is the position on the axis of the centre of gravity. 

Thus the observation of the median and two quartiles gives 

us a rapid means of calculating] and c; for c= — — and 

OQ,-OQ. 
J-0Q,+0Q/3'4. 

This method of calculating j bears nearly the same relation 
to the longer method given below, as the use of the average error 
or probable error (see p. 292) does to the use of the error of 
mean square. 

j, calculated by any of these methods, is the measure of the 
skewness of the curve, which it was our object in this section 
to find. 

Even when a group has no very close connection with the 
first or second approximation to the curve of error, it seems 
probable that c and j, calculated by the methods which for the 
particular group seem the least liable to chance disturbance, are 
the best single measures of the grouping about the average and 
the skewness, that we can devise. 

The following example shows the method of calculating c 
and j by means of mean square and mean cube of error.J 



* These equations are due to Professor Edgeworth ; their use is 
investigated in a joint article by him and the present author in the Statistical 
Journal, July 1902. 

t Hence CM =i(>-^). 

\ Other examples are to be found in the Statistical Journal, loc, cit 



APPENDIX. 



Values of f{x) for Different Values of x, where 



'. 


/M. 


,. 


/W- 


.. 


/{.). 


,. 


/U). 


1.41 


.264 


.00 


000 


■ 36 


.066 


■71 


.189 


1.06 


.264 






■37 


.069 


.71 


.192 




■& 


1.42 


.264 






■38 


.07a 


■ 73 


■ 19s 




1.43 


• 263 


■03 




■39 


.□76 


■ 74 


..98 


1.09 


.267 




.262 


.04 




-4° 


.079 


■75 






.268 


1.45 


.262 


■05 


.001 


•4" 


.0S2 


■76 


.2CM 




.268 


1.4I 


.261 


.06 


.001 


■4» 


.086 


.77 


.207 


1. 12 


.269 


'■47 


.260 


■°i 


.003 


■43 


.090 


■ 78 




1^13 


.270 


1.48 


:S1I 


.08 


.004 


•44 


■093 


•P 


.213 


1. 14 


.270 


1.49 


.09 


.005 


■45 


.097 


.80 


.216 


1.15 


.270 


1.50 


•257 




.006 


.4S 




.81 


.118 


1. 16 


.271 


1-52 


.256 




.007 


•47 


!l04 


.8z 




1.17 


.271 


1.54 


.254 




.008 


.48 


.107 


.83 


.224 


1.18 


.171 


t.56 


.252 


■ •3 


.009 


■49 




.84 


.226 


1. 19 


.272 


'■t 


.250 


.14 




■ SO 


:i;i 


.85 


.229 




.272 


1.60 


.248 


■IS 


.or3 


■SI 


.86 


■23" 




.272 


1.62 


.246 


.16 


.014 


■52 




.87 


■233 




.272 


'.64 


• 244 


.17 


.016 


■S3 


.126 




■236 


1.23 


.272 


1.66 


.142 


.18 


.018 


■54 


.129 


.89 


.238 


1.24 


.272 


t.68 


.140 


■>9 




■'i 


■ 133 


.90 


.240 


1.25 


.272 


1.70 


.238 






■ 137 


.91 


.242 


1.26 


.272 


1.72 


.236 




.024 


■ S7 


.140 


.92 


■244 


1.27 


.271 


1.74 


■234 




.016 


•58 


■ 144 


■93 


.246 


1.28 


•271 


1.76 


.232 


■ a3 


.029 


■59 


.148 


■94 


.248 


1.29 


.271 


1.78 


.230 


■24 


.03 ■ 


.60 


■ 151 


■9S 


.249 


1.30 


■271 


1.80 


.228 


.25 


■033 


.61 




.96 


■251 


131 


.270 




.127 


.26 


.036 


.6z 




■97 


■253 


1.32 


.270 


1.84 


.225 


.z7 


.039 


■63 




.98 


.254 


"■33 


.269 


1.86 


.223 


.zS 


.041 


.64 


.165 


.99 


.256 


".34 


.269 


1.88 




•29 


.044 


•65 


.169 




.»S7 


;:ii 


.268 


1.90 




■30 


-047 


.66 


■ 17* 




■259 


.z68 


1.92 


.218 


•31 


.OjO 


.67 


.176 




.260 


'■37 


.267 


1.94 


.217 


■32 


■053 


.68 


■'79 


1.03 


.261 


1.38 


.267 


1.96 


.215 


-33 


.056 


.69 


.iSs 


1.04 


.262 


"■39 


.266 




.214 


■34 


.059 


.70 


.186 


1.05 


.263 


1.40 


.265 






■35 


.061 



















A1>PEND1X. 333 

The data are the number of school children of various ages 
in the sixth grade in the public schools of St Louis, U.S.A. 
On the axis of abscissae (0), are measured the ages. One year is 
taken as the unit, and 10 J years as the origin from which the 
ages are measured. The number in each age group is denoted 
^yj^' fhi Hi H ^^^ ^® average first, second, and third powers 
of the deviations from the origin. ^4, /*2> /*8 corresponding quanti- 
ties measured from z (=fji^') the centre of gravity. 

One student over 19 years is excluded. 



Ages. 


Numbers at 


Corresponding 








Years. 


each Age. 


Values of. 










y 


9 


>« 


jf*« 


J^M» 


lO-II 


26 














II-I2 


201 


I 


201 


201 


201 


12-13 


673 


2 


1346 


2692 


5384 


13-14 


lOOI 


3 


3003 


9009 


27027 


14-15 


739 


4 


29«;6 


1 1824 


47296 


15-16 


310 


5 


1550 


7750 


38750 


16-17 


80 


6 


480 


2880 


17280 


17-18 


13 


7 


91 


637 


4459 


18-19 


I 


8 


8 


64 


5x2 



N=3044=?y 9,635 35,057 140,909 

^=/*i' = ^ = ^o.4 = 3-^^5- Average age is 10^ + 0= 13.665 (years). 

H — jf = "-517 ^8=Tf =46.291. 
^y(z-zY ^yz'^ -'2yz - 2y - 

Jfi2 is then the error of mean square ; but Mr Sheppard * has 
shown that when, as in this example, the numbers are supposed 
to be at the centres of their groups (at 10 J, 11^ years, &c.), 
instead of being scattered through them, it is proper to subtract 
xV from ftg, as calculated above. 

Then c= j2{fi2-Y^)=i.6Si (years). 

/*8 = N ^8 -3/*2 2 + 22;3 = .347 

Then j=^ = . 0728. 



* See Statistical Journal^ Sept. 1897. 



334 



APPENDIX. 



Putting T for - , we have the following scheme for finding the 
values of Y^ between the average age and the years 1 1, 12 



Ages. 



T. 
- 00 



F(t)* 
F(r)./(T).j/(r).±jy(T) 
•Scx) .188 .014 .514 

11 years -2.665 -1.583 .487 .249 .018 .505 

12 „ -1.665- '9^ '419 .256 .<^I9 .43^ 

13 „ - .665- .395 .212 .077 .006 .218 



14 




+ .335+ .199 


.III 


.022 


.002 


.109 


15 




+ 1.335+ .793 


.369 


.214 


.015 


.354 


16 




+2.335+1.387 


.475 


.266 


.020 


.455 


17 




+3.335+ I.98I 


.497 


.214 


.016 


.481 


18 




+4.335+2.575 


.500 


.191 


.014 


.486 


19 




+5.335+3.170 


.500 


.188 


.014 


.486 


— 




— 00 


.500 


.188 


.014 


.486 



Differ- 
ences of 
CoL 7. 




Numbers 

Calculated 

Col. 8 Actual. 
Malt.byN. 



27 

204 

670 

995 
746 

307 

79 

15 

o 



26 
201 

673 

lOOI 

739 
310 

80 

13 
I 



Differ- 
ences 01 
Errors. 

I 

3 

3 

6 

7 

3 

z 

2 
I 

27 



The sum of the difference is thus 27 on a total of 3044, or 
about .9 per cent. 

The misfit without the j term is more than five times as 
great, viz., 5.1 per cent. 

The median is at (13665 —J j c) years, that is at 13.624 years. 

The mode is at 13.543 years. 

The normal curve (^=1.683) and the skew curve (^=1.683, 
j = .0728) are shown on the adjoining diagram, together with the 
observations. If the agreement was perfect the areas of the 
rectangles formed by the ordinates through the points corre- 
sponding to the years, the axis, and the parallels through the 
calculated positions, would equal the areas cut off from the skew 
curve by the same ordinates. 



♦ Additions when t is negative. 



4 



QC 
O 
QC 
QC 
UJ 

Ll 

O 

UJ 

> 

3 
U 

UJ 

CO 

UJ 

X 

Ll 
O 

UJ 
Q. 

< 

X 

UJ 



O 

< < 

^ o 
o 



2: 

u 

Q 
.J 



U 



C/3 

O 

o 

X 

o 

c/3 



o 
o 

c/3 oa 



(X. 

O 
O 

D 



CO 

D 

S 

c/5 




INDEX. 



{References to definitions are printed thus : — /op.) 



Accuracy, 199-214. 

Age, 29, 147, 251, 312, 333. 

Agricultural Wages, 50-52, 97-103, 109, 

iio, 115-117. 

Arithmetic Average or Mean, X07-110, 
/09, 125, 126, 128, 129, 130, 136, 221, 
329 ; error in, 204, 306. 

Asymmetry, see Skewness, 

Average error, 28s i 2g2, 329, 331. 

Average wage, 6, 1 1. 

Average : Precision of, 305-308. 

Averages, 7, 19, 89, 92, 95, 107-130, 130, 
i33-'40, 143, 214, 264; see Arith- 
metic Average ^ Median, Mode, Weighted 
Average, 

Bertillon, Dr J., 17, 129, 130, 158. 
Bias, 118. 

Biassed errors, 209-214, 219. 
Bibliography : of Interpolation, 258 ; of 

Law of Error, 327. 
Binomial Expansion or Theorem, 265, 

272-277, 288, 291, 301-2, 306. 
Births, 287. 
Blank Forms, 1 8, 19, 26, 27, 30, 37, 42, 

46, 63 ; specimens of, 23, 35, 36, 45, 

48, 5>» 52» 54-58, 65, (il, 69. 
Boole, 242, 247, 251. 
Booth, C, 9, 27, 32, 7880, 123, 158, 

251. 
Bortkewitsch, Dr, 302. 

Cartograms, 156-158. 

Census: Population, 10, ii, 23-32, 63, 
78-81, 82, 99, 233. 

Census: Wage, 11, 12, 33-40, 63, 87, 
92-96, 114, 125, 233. 

Chance, 266, 267 ; see Probability, 

Changes in Wages, 54-58, 61, 97-103. 

Chauvenet, 303. 

Coefficient : of Correlation, 318 ; of Pro- 
bability, 2gg ; of Regression, 323* 

Coefficients: Statistical, I2g, 130, 296, 
299. 

Collection of material, 17, 18. 

"Combinational" Groups, 299. 

Comparison : Accuracy of, 206, 212, 305. 

Comparisons of Series, 168-177, 192-194; 
see Correlation, 



Consumption : !^ndex No. of, 228. 

Correlation, j/6, 317-325. 

Cotton: wages, 39, 95, 96, 114; trade, 

164-167. 
Curve of Error, see Error, Law of. 
Curves of Frequency, 303, 
Cycles of Trade, 153, 181. 

Darwin, G. H., 256. 

De Morgan, 242, 247. 

Deciles, 124, 125-128, 133, 136, 144. 

Demography, 6, 7, 23. 

Deviations, 126. 

Deviation : Standard, 285, 292 ; Quartile, 

282. 
Diagrams, 19, 88, 143-196. 
Dispersion, 136, 140, 329. 

Earnings, 37. 

Economist, The, ii, 214, 221, 223-224. 

Edgeworth, Prof. F. Y., 118, 187,253, 

254, 257, 262, 285, 299, 303, 307, 313, 

315- 
Employment, see Unemployment. 

Equation of Regression, 32J, 

Equation of curve of error : normal, 283 ; 

skew, 329. 

Error, 201, 203-214, 330. 

Error, Curve of, 261, 269-292, 309-310, 

3"» 329-334; -^«w of 5, 267, 303-315- 
Error of mean cube, 330, 331. 
Error of mean square, 28s, 286, 290, 

292, 330, 331, 333; see Standard 

Dez'iation, 
Error, probable, see Probable Error, 
Exports, see Foreign Trade, 

Facility Curves, 303, 
Fluctuation, 28$, 2gi, 298. 
Foreign Trade, 11, 63-70, 148, 151 -4, 
170-1,174-7,188-191,221.3,250,320-2. 
Forms of Inquiry, see Blank Forms, 
Fox, W., 100, 124. 
French Wages, 39, 40. 
Frequency Curves, 303, 

Gabaglio, Prof. A., 156. 
Galton, F., 89, 126-8, 322, 324. 
Geometric Mean, 128^ 129, 221-3. 



336 



INDEX. 



Giffen, Sir R., lo, 70, 15 1-4. 
Graphic Method, see Diagrams; of In- 
terpolation, 238-9. 
Great Numbers, 8, 263-4. 

Historical Diagrams, 159-167. 
Hours of work, 54, 57, 58. 

Imports, see Foreign Trade. 
Index-Numbers, in -2, 190, 217-29. 
Interpolation, 19, 233-58. 

Jevons, W. S., 128, 178. 

Labour Commission, 37. 

Labour Department, 10,41-62,63,97-103. 

Labour Gazette, The, 12, 41, 44, 46, 48, 

50, 58, 60, 239. 
Labour Statistics, la 
Large Numbers, 4. 
Least Squares, 5, I77* 
Le Play, P. G., 7- 
Levasseur, P. E., 156. 
Levi, Leone, 9. 

Lexis, Prof. W., 263, 280, 297, 298, 299. 
Ix>garithmic Curves, 188-196. 
Logarithms, Table of, 195-6. 
Luck, 293. 

Makeham's Formula, 257. 

Marriage Rate, 108, i74-7i i93-4» 320-2. 

Maximum Ordinate, iig; see Mode. 

Mean of Errors ; see Average Error. 

Median, 95, 117, 123, 124, 125-8, 130, 
I33» 136, 138, 144, 154-6, 221, 224, 
290, 323» 329. 33«» 334; determina- 
tion of, 127, 155, 252. 

Merriman, M., 256, 284, 303. 

Method : of Least Squares, 5, i77 ; Statis- 
tical, 4, 7, 17-20. 

Mode, 7/9, 1 18-124, 130, 133, 136, 144, 
155-6, 329, 33i> 334 ; determination of, 

I55» 252. 
Modulus, 283-Si 288, 289, 290, 2gi, 299, 
303-9, 312, 313, 314, 329-334; prob- 
able errors of, 319. 

Normal Curve of Error, 277, 303, 
304» 309* 330- 

Occupation, 29, 82, 99, loi. 
Official Statistics, 9, 10, 213. 

Pearson, Prof. Karl, 5, 316, 318, 319, 

326. 
Percentiles, 124, 329. 
Periodic Figures, 178-87, 240, 315. 
Population, 28 ; see Censtis. 
Poynting, Prof. J. H., 181. 
Precision, 201, 28s> 286, 2g2, 309 ; scale 

of, 273; of an average, 305-8; see 

Modulus, 



Prices, see Index-Numbers. 

Probable Error, 281, 290, 2g2, 305, 306, 
307, 308, 320, 329, 331 ; of coefficient 
of regression, 319; of modulus, 319. 

Probability, 20, 2^. 

Purchasing Power, see Index -Numbers. 

QUARTILES, 95, 124, 125-8, 133, 134, 
136, 144, 155, 290, 313, 320, 329, 331. 

Quartile Deviation, 282. 

Questions, 24. 

Quetelet, A., 1 18-9, 124, 272-3, 278, 280, 
284, 285. 

Regression, 324, 325. 

Retail Prices, 11 ; Index -Number ol, 

225-9. 
Revenue, 159-61. 

Samples, 20, 219, 224, 225, 308-313. 

Sauerbeck, A., 190, 223-4. 

Sheppard, W. F., 255. 

Skew curve, 329, 334. 

Skewness, 277, 329, 331. 

Small numbers, 301. 

Smoothing, 151-6. 

Standard Deviation, 28^, 2^2. 

Statistics, j, 4, p', 262 ; official, 9, lO. 

Statistical Abstract y The, 10. 

Statistical Coefficients, y-29, 130, 296, 299. 

Stirling's formula, 280. 

Strikes, 51, 54-62. 

Summary, 17, 19. 

"Symptomatic" Series, 240, 315. 

Tabulation, 17, 18, 24, 73-io3» i33" 

140. 
Tellers, 3, 25, 26, 28, 31. 
Trade Unions, 42, 46, 53, 81. 
Type, 6, 124, 130. 

Unbiassed Errors, 209-14, 219. 
Unemployed, 40, 41, 42, 45, 46, 52, 178- 
187, 192-4. 

Venn, Dr J., 262, 266. 

Wage Census, see Census. 

Wage Statistics, ii, 87-92, 120, 134-6, 
144-6, 250, 310. 

Wages, 54, 57, 58, 61, 149, 150; see 
Agriculture. 

Weighted Average, ///, 112-8 ; errors in, 
205, 207, 214, 219-222, 304. 

Westergaard, Prof. H., 287, 329, 

Wheat, 161-3, 174-7, 186, 320-2. 

Wholesale Prices, 11 ; see Index-Num- 
bers. 

Wood, G. H., 193, 228. 

Yule, U., 282, 317, 3x9. 



The thick line A^ A^ A^ Aj A4 represents the nonnal 
curve of error. 

C^ Cj Cj is a curve of error with the same unit of 
abscissie as A^ A^ Aj, but with ordinates diminished in the 

B, E, B, is a curve of error with both ordinates and 
a.bscisES half those of A, A.j A^ 

The areas of B, B, B, and Ci Cj C, are equal ; but the 
modulus of the fortner is half that of the latter, and it 
lepresenls observations of twice the precision. 

The area contained between the vertical lines through 
I'l, P. and the curve A, A3 Aj and X O X, is half the area 
between the curve and X O Xi ; similarly for Cj Cj Cj ; 
similarly with lines through pj, Uj for B, Bj B^. 

P], P~, pi, p., are positions of^ probable errors. 

M,. Mo. in..'nL. 



ADDENDA AND CORRIGENDA TO THE 

THIRD EDITION. 

The theory of statistics has developed so rapidly in recent years 
and tables of recent statistics are so quickly out of date that it is 
advisable to add some notes to the former and to complete the 
latter, even during the issue of an edition. It is hoped that 
some day the whole work will be re-cast so as to include these 
and other notes, but events are not yet ripe for this and we are 
still far from finality in the treatment of mathematical statistics. 
The opportunity is also taken to correct some misprints and 
mistakes. 

October 191 1. 



Chapter III.— Section i. 

The Census Schedule for England and Wales for 191 1 differs 
in important respects from that of 1901. The wording of the 
headings is clearer. Ages of males and of females are given 
in separate columns to prevent mistakes in tabulation. Instead 
of " Condition as to Marriage " we have five columns under 
"Particulars as to Marriage." The first is as column 3 in 1901, 
the others ask for each married woman the number of completed 
years the present marriage has lasted, how many children have 
been born alive and how many are still living, with a column 
for " children who have died " to serve as a check. It is hoped 
that this new information, similar to that already obtained in 
France, Australia, and elsewhere, will throw light on the causes 
of change of the birth-rate and other important demographical 
problems. The columns for profession or occupation are 
restricted to persons aged ten years and upwards and have 
been re-cast. For the old column 6 we have two, one to show 
"the precise branch of Profession, Trade, Manufacture, &c.," 
including "the particular kind of work done and the Article 
made or Material worked or dealt in," the other to show the 



338 ADDENDA AND CORRIGENDA. 

** Industry or Service with which [the] worker is connected," 
explained as generally meaning " the business carried on by the 
employer." This will, it is expected, lead to more accurate 
classification. Columns 7-9 of 190 1 are reduced to one, and a 
new column is added to be filled in by those " carrying on trade 
or industry at home." The heading of column 10 of 1901 is 
re-arranged so as to distinguish between the United Kingdom, 
the rest of the British Empire, and foreign countries ; while 
those born out of England or Wales are to state whether they 
are residents or visitors. A new column is added for nationality 
for persons born in a foreign country. Finally, instead of asking 
for the number of rooms only when less than five, the notice is : 
"Write below the Number of Rooms in this Dwelling (House, 
Tenement, or Apartment). Count the kitchen as a room, but do 
not count scullery, landing, lobby, closet, bathroom ; nor ware- 
house, office, shop." 

No difficulty seems to have been found in obtaining answers 
to everything asked, but there may be some slight inaccuracy in 
the answers as to duration of marriage and number of children. 
The attempted abstention of some Suffragists is not likely to 
have any perceptible effect on any part of the census, except that 
a very few occupied women may be counted as unoccupied. 

Electric machinery is being used extensively in working up 
the tables. 

Section 2. 

A second Wage Census was carried out in 1906. The 
general lines were similar to that of 1886. Specimens of the 
schedules used are printed in the Report on the Textile 
Trades (Cd. 4545). In 1886 it was found that returns of the 
wages of individuals were often not given, but only the average 
for a group engaged in one occupation. This was avoided in 
1906. Actual earnings of all individuals in the last pay-week of 
September 1906 were obtained instead of the hypothetical earn- 
ings of a normal week as in 1886, and those working the normal 
week are separated from those working over or under time. In 
the tabulation the arithmetic average, the median, and the 
quartiles are given for the principal occupations in the principal 
districts. The publication of the results has been greatly delayed. 
It is feared that much difficulty will be found in making general 
comparisons with the previous census. 



ADDENDA AND CORRIGENDA. 339 

A Census of Production was carried out by the Board of 
Trade for the year 1907. The first of the series of publications 
of the results is Cd. 4896. 

Section 3. 

The work of the Labour Department has not changed 
its character. The current Labour Gazette and the current 
Abstract of Labour Statistics (Cd. 5458) will show how its 
various statistical activities have developed. 

Section 4. 

Page 63, lines 17 and 18. After "despatched" add "and 
place of destination, &c., to the Custom House officials and so." 

Line 25. Read "verified by officials . . . having been filled 
in by agents." 

Page 65. In heading read "which" for "whence," and at 
foot ''^ pro Collector^ 

Page 66, line 12. For "twenty" read "ten." 

Line 1 3. For " takes no " read " receives an." 

Page 6t, last two lines, and top of page 68. The clerk's duty 
is to require the agent to complete the forms, if imperfect, and 
to test the values by current price lists, &c. If he makes entries 
himself it is irregular. Similarly, on page 68, line 22, this pro- 
ceeding is irregular. It is possible, but it is believed to be 
exceptional, that unusual entries are avoided by the clerk or 
agent to save trouble. 

Page 68, last line but 2. After "sent" insert "through 
the Custom House officials or directly." 

Page 70. Definitions of imports and exports, with a state- 
ment as to how the data are obtained, are now given in the 
introductory pages of the Annual Statement of the Trade of 
the United Kingdom, To the statements on page 70 should 
be added that the following are excluded : — From Imports, fish 
of British taking landed in British ships arriving direct from the 
fishing grounds ; goods directly imported by ambassadors and 
ministers accredited to this kingdom ; old vessels bought from 
foreigners. From Exports, ^/<!/ vessels sold to foreigners. From 
Imports and Exports, sacks, cases, &c., used as packages ; ships' 
stores, ballast, and military and naval stores on board Govern- 
ment vessels ; goods transhipped under bond, and goods in 
transit through the country on a through bill of lading, of which 



340 



ADDENDA AND CORRIGENDA. 



separate accounts are given. Coin is treated in the same way as 
bullion. 

Pages 63, 64, 6y, and 69. Since 1904 a column has been 
added to the form as given on page 6y and allied forms headed 
" Name of place whence goods consigned," and to that on page 
69 headed " Final Destination of the Goods," and in the head- 
ing of the form, page 6y, " whence " is replaced by " of Shipment 
of Goods." The corresponding form for foreign and colonial 
goods has two columns for " Final Destination of the Goods " 
and " Country whence goods were consigned when imported." 

Prior to this the statistics of imports and exports were neces- 
sarily classified according to the countries from or to which they 
were shipped, in most cases, so that Switzerland, for example, 
was not named in the trade accounts. Now, additional tables 
are given showing the countries to or from which they are con- 
signed. These are still not necessarily the countries of origin or 
of ultimate consumption. (See Committee on Trades Records 
(Cd. 4346) and compare a current Statistical Abstract of the 
United Kingdom with one prior to 1905. For general defini- 
tions of international trade see the Reports of the Committee of 
the British Association on The Accuracy , , , of , , , Statistics 
of International Trade, 1904 and 1905.) 

Chapter V. 
Page 128, last line but i. For " 57 " read " 57.7." 



Chapter VII. 
The figures for continuing the table on page 151 are :- 



1907 
1908 

1909 

I9I0 




Averages. 


Three 
Yearly. 


Five 
Yearly 


Ten 
Yearly. 


416.0* 

366.5* 

372.3 

421.6* 


369.1 

3832 

384-9 
386.8 


338.0 

3540 
369.2 

388.7 


301. 1 

314-4 
326.1 

333-9 



Page 160. These statistics cannot be carried forward with- 
out special difficulty, for current statements of Imperial revenue 
include sums which, prior to 1907-8, were paid direct to local 
taxation accounts. (See Statistical Abstract for 1910, pp. 3 
and 33.) 



ADDENDA AND CORRIGENDA. 



341 



Page 163. Continuation of table: — 





A. 


B. 


C. 


D. 




Total Quanti- 


Total Value 


Quantity 


Average Price of 


Year. 


ties Imported. 


Imported. 
Unit 


Retained per 
Head of the 


Wheat and Wheat 




Unit 


Flour in Shillings 




100,000 cwt. 


;^I0O,000. 


Population. 


per Cwt. 


1906 


1. 127 


395 


275 lbs. 


7.01 


1907 


1,156 


440 


281 „ 


7.61 


1908 


1,091 


454 


262 ,, 


8.32 


1909 


1. 132 


516 


273 .. 


9.12 


1910 


1,191 


497 


286 „ 


8.35 



For columns A and D the wheat meal and flour imported 
are replaced by their equivalent weight in grain, i cwt. of meal 
and flour being obtained from and equivalent to 1.40 cwt of 
grain. For column C this correction has not been made ; the 
difference can be seen by comparison with Table 42 of the 
Statistical Abstract for 1910. Since the population of 191 1 has 
proved to be about i per cent, less than the official estimate, 
which was used for column C, figures from 1901 to 1905 need a 
very small addition. The general shape of line C in the diagram 
is not affected perceptibly by either consideration. 

Page 164. Continuation of table: — 



Year. 


Piece Goods Exported. 


Raw Cotton Imported. 


Price per 
Cwt. 


Quantity. 

OOOjOOo's 

omitted. 


Value, 
ooo's omitted. 


Quantity, 
ooo's omitted. 


Value, 
ooo's omitted. 


1906 
1907 
1908 
1909 
1910 


Yards. 
6,261 
6,298 

5.531 
5.722 
6,oi8 


£, 

81,049 
70,231 
68,279 
78,685 


cwts. 

17.923 
21,312 

18,399 

19.543 
17,614 


£ 

55.750 

70.458 

55.835 
60,295 

71,712 


£ 
3.11 
331 
3-03 
3.09 
4.07 



Page 170, last line but 5. For "it is not possible" read 
"it was not possible prior to 1904." 
Page 171. Continuation of table: — 







Total 


Exports 


Exports 


Imports 


Imports 


Imports 

from 

Germany, 

Holland and 

Belgium. 




Total 


Exports 


to 


to 


from 


from 




Imports. 


including 


British 


Foreign 


British 


Foreign 






Re-exports. 


Possessions. 


Countries. 


Possessions. 


Countries. 


1906 


6,079 


4,607 


1,306 


3.300 


1,422 


4.657 


1.037 


1907 


6,458 


5,180 


1.475 


3.705 


1. 571 


4,887 


1.039 


1908 


5.930 


4.567 


1.357 


3.2II 


1.298 


4.631 


1,015 


1909 


6,247 


4.695 


1.363 


3.332 


1,469 


4.778 


1,067 


I9IO 


6,782 


5.341 


1. 574 


3.767 


1,706 


5.076 


1,141 



342 



ADDENDA AND CORRIGENDA 



Page 174. Continuation of table: — 



Year. 


Marriage 
Rate. 


Total Exports 

and Imports 

per Head. 


Average Price 

of Wheat 
per Quarter. 






I s. d. 


J. d. 


1897 


16.0 


18 13 


30 2 


1898 


16.2 


18 19 


34 


1899 


16.5 


20 


25 8 


1900 


16.0 


21 6 


26 II 


1901 


15-9 


20 19 


26 9 


1902 


15.9 


20 19 


28 I 


1903 


15.7 


21 8 


26 9 


1904 


15.3 


21 13 


28 4 


1905 


15.3 


22 13 


29 8 


1906 


15-7 


24 13 


28 3 


1907 


15.9 


26 12 


30 7 


1908 


15. 1 


23 16 


32 


1909 


14.7 


24 12 


36 II 


1910 


14.9 


27 


31 8 



Section 3. 

The International Institute of Statistics has considered the 
possibility of standardising historical diagrams for comparison, 
and resolved at its meeting in 191 1 that the average of the 
figures for the years 1901-10 should be taken as the standard 
and that this average should be represented by a vertical height 
equal to the horizontal measurement that represented thirty 
years. Diagrams drawn on this standardised scale can then 
readily be compared with one* another whatever quantities they 
represent. It is not intended to prevent other comparisons 
being made (as, for example, those on the diagram facing p. 
172), nor diagrams that represent series all expressed in the same 
units {£ or tons) being drawn with the same natural unit. The 
intention is that the standard should be adopted as the only 
form where there is no reason to the contrary, and as an alterna- 
tive form in other cases. Comparison, especially of international 
statistics, will be greatly facilitated if these rules are followed. 

Page 191. Continuation of table : — 



Year. 


Imports.* 
£ mln. 


Logarithms. 


Exports.! 
£ mln. 


Logarithms. 


1907 
1908 
1909 
1910 


646 

593 
625 

678 


2.810 

2.773 
2.796 

2.831 


518 

457 
469 

534 


2.714 
2.659 
2.672 
2.728 



ADDENDA AND CORRIGENDA. 343 

Chapter VIII. 

Page 203, last line but 4. For " ^ " read " ^." 

600 600 

Page 208, last line but 14. At end read "r^ " for "r." 

13. For "^" read "J"; for "^" read 



>i n 



u 1 )) 






12. For «^" read " -A"; for "t^" 

read'^T^V" 
2. For " - .006 " read " + .0062." 

I. For " + .008 " read " + .0081.*' 

last line. For " - .001 " read " - .0008." 

Page 209, line 5. For " +.001 " read " +.0135." 

„ 7. For "1.338 "read "1.3376." 

„ „ 8. For "1.335" read "1.3530"; and for the 

fraction read " —54; = .01 1 5." 

133S 
„ 10. For " I " read " 2." 

15. For "33.8" read "33.76." 

„ 16. For "33.5" read "35.3"; and for ".01" 

read " .046." 






Chapter IX. 

Page 221, second table, explanation of first heading. Let d^y ^2 • • • 
be quantities, and /p ^2 • • • Prices in 1881, and let r^, ^2 • • • be 
quantities and r^/i, r2/2 • • • prices in 1895. The index number was 

obtained from 100 x ^p^, where the. numerator is the declared value, 

the denominator the value of 1895 goods at 1881 prices. The weights 
of r^, rg . . . are therefore c-^pi, ^2 A • • • 

Page 227, line 5. For " 73.2 " read " 79.3." 

Chapter X. 

Page 236, line 23. For " B " read " C." 

Page 243, last line but i. Insert ". . . " after the second bracket in 
each denominator. 

Page 251, last table. For "12J, 17^," &c., read "15, 20," &c. ; 
and on page 252 read "40" for "35" in line 3, put the corrected 
values for x^, x^ &c., in the fractions, and obtain "4299.3" instead of 

".4412." 

Page 253, line 16. Read " (25 " for " 2)5." 

Page 256, line 10. For ''y^'* *'y^ " read ''y^^," "j'-r" 

„ 20. Read " 2(« -\-dx + cx^ + do(?' -yf:' 

28. For ">'" read" V' 



>f ft 



344 ADDENDA AND CORRIGENDA. 

Part II. — Section II. 

Page 271, line 9. Read " in " for **is." 

Page 288. In the table, for "51.44" read "51.43 " throughout. 

Section III. 
Page 299, last line but 6. For **«" in the denominator read " */«." 

Section V. 

Page 308, line i. For "whole" read "average of." 
Page 312, line 11. For "precision" read "modulus." 
Page 313, line 9. For "probable errors" read "moduli." 
Pages 313 and 314. Example of periodicity. The numbers 
given do not agree with those on page 179, and the source of the 
divergence cannot now be discovered. The modulus for the 240 
separate entries 1855 to 1874 is about 7.0 and the average 8.35; /.^., 
about five-sixths of the entries are within 8.35 ± 7.0, The modulus for 
comparison of the average for a selected month with the general average 

is 7.0yy/-- + -i-= 1.63. July, August, September, January, and 
V 20 240 

February differ from the general average by a quantity about equal to 

the modulus, and in the case of any month this might happen by 

chance. The modulus for comparison of the averages of two selected 

months is 7.0^/ — + — = 2.2. August differs from December by half 

V 20 20 

as much again, an unusual but not impossible occurrence, if fortuitous. 

The evidence for periodicity arises in fact from the consecutiveness of 

the bad and of the good months ; the difference between the average 

for July to September and that for December to February is 3.0, while 

the modulus for the difference between two three-monthly periods is 

only 7.0a/ — = 1.26. This example illustrates the difficulty of testing 

periodicity. 

Section VI. 

Page 321. In 1852 for " 145 " and " 92 " read " 141 " and " 90" ; 
in 1863 for ".64" read ".04"; in 1864 for "63" read "65." In the 
sequel ^xy= -4Sij and hence r= -.29. 

Page 322. In 1884 for " .17 " read " .07," and carry the correction, 
which is of no practical importance, on. 

As to pages 321 and 322. When the correlation between two 
symptomatic series (see p. 240) is sought it is better to eliminate the 



ADDENDA AND CORRIGENDA. 345 



«( 



symptom " or " trend " by taking the differences, not from the general 
average, as on page 321, but from a moving average, as that calculated 
on page 151 and represented by the thick line on the accompanying 
diagram. Thus, on page 321, the averages for 1845-9 are 16.5 and 
54s. od., and the differences for 1847, the central year of the period, 
are - .7 and + iBQd. Otherwise we shall be correlating trends as well as 
diflferences. The length of the period to be averaged depends on 
circumstances. 

Appendix. 

Page 333, last line but i. Insert " = '' before "ftg'." 



Mathematical Notes Illustrating and Explaining certain 

Pages in Part II. 

Stirling's Formula {p, 280) and the Deduction of the Equation of the 
Normal Curve of Error (pp. 275-7 and 329). 

It is proved in the trigonometry text-books that — 

^m . {2m )\ 

when — is neglected. 
m 

If, on pages 275 and 276, we take the special case / = i = ^i and 

write 2m for «, we have therefore P = \ (' ,(-] = — r? — r. 

(m\)^ \2/ Jiirm) 

Using the analysis on page 276 with / = J = ^, and writing c^—2pqn 

= w, we find— 



J' = Px = -i^^ 






thus obtaining the constant multiplier. 
Now — 



i + i) = 2 P,=/ y.dx, 

2 2/ x--m y -00 



where dx is written for the distance on the horizontal axis between two 
consecutive ordinates and m is indefinitely increased 

Hence we have a proof of the value of the very important integral 
used on page 283 : — 

/*I~r2 yw— X^ 

— r-e .</^=i,and / e ,dx= Jir, 



346 ADDENDA AND CORRIGENDA. 

To obtain the corresponding result when p and q are unequal we 
need Stirling's formula (quoted on p. 280), of which the following out- 
line proof may be offered. 

It is shown in Todhunter's Algebra that if tn is large and — ^ 
neglected — 

Let 

,= (?^^f I !_ Vi - -±-\ . . . f I ^Vdentically. 

Then -logs 

= -Tlo (1- * \_'^'g° i'+ 2'+ ^m' 

t-i \ 2m-{-i/ r-i r(2»r+iy 



^ + 2^., if>& = . 



= W^ '^ . vvv -r - w 



r(r+i) 2r 2m +1 

-''E-t)<f-f)--]-J'-<-*> 

= 4- log (i -k)+i+L\og(i-k)]-llog (l ->J) 
= (»» + -].log— — - + »« 

\ 2/ 2W + I 

= — -\ log ( 1+ )-l0g 2 +W = ilogtf-log2'"'*' + 

2L\2W+I/ J 2 

... (-'n)i ^,ri\'-\ 

m\{2tn^iy* \e J 

Now eliminate {2m)\ by Wallis' formula and we have — 

! = (2w)- . ^i + —^ . (^^ . (miry . 2-»'« = m'^e"^ ^{27rm\ 



m. 



m 



when — is neglected throughout. 
m 

This is the first approximation to Stirling's formula. 

The proof on page 276 can now be put in a simpler form : — 

^ ^ (,pn^x)\{qn^x)\'^ '^ ' 
where/ + ^=i. 



ADDENDA AND CORRIGENDA. 347 

Replace the factorials by Stirling's formula, simplify, and take 
logarithms. Then — 

log [y \/(2^/^«)] 

= -(/« + . + i)log(i+^J-(^«-^ + i)log(x-^^) 

Write x = x' >s/«, expand the right-hand side, and neglect all terms 
involving «"^ or higher negative powers of n as on page 277 :— 

log [y >/(2^/^«)] 

V ^ 2/\ pjn 2 p^n 3 p^n\ 4 /*«2 / 

+ [gn-x V«+--) — r +---T-+---fi 3 + 4-9+ • • • ) 

\ 2)\q sjn 2 q^n 3 ^^^^^ 4 ^*«2 / 

___! /2A.£\,i :r' /i A.I y / I _ I \ 
2 \/ ^/ 2* Jf^q^p) 6'~Jn\p^ f) 

= — — + -f- — V •13'^ - — )» smce p-^q=i, 
2pq 6pq^n V pqP ^ ^ 

wnte (^=2pqn and/^^ ~^ 
restore x. 



Now wnte ^=2pqn andy=?-^, as on pages 277, 287, and 329, and 

2C 



I 

y^ 



.ec^,e ^Lc 3VC/ J 



= — J- . tf ~c2 . ^ I - 27 - - -f - j l, ify2 can be neglected. 

We obtain the first approximation, that is the normal curve, by 
neglecting /. 

If we needed a further approximation, involving «"\ we should have 
to take a further term also in Stirling's formula. 

77u Relation of an Average or Sum to the Normal Curve, 

A fairly simple proof of the fundamentally important proposition of 
page 303 can be adapted from Professor Edgeworth's paper on the 
" Law of Error" {Camb. PhiL Trans,, Vol. XX., Part I., 1904). 

It depends on the use of moments, which have not been defined in 
the text of this book, but are used on page 333. ^^ y—A^) *s the 

o(^ . ydx divided by the area 

of the curve is called the «th moment of the curve about this axis of 7. 
The first moment gives the abscissa of the centre of gravity; the 
second gives the mean square of error (o-^) (in dynamics it is the square 



348 ADDENDA AND CORRIGENDA. 

of the radius of gyration) ; the third gives the mean cube of error 
(p. 330, footnote), and so on. Professor Karl Pearson has made great 
use of the hypothesis that curves are similar statistically, if they have 
the same area and their successive moments are equal. Equality 
between the first, second, and third moments of the observations and of 
the theoretical curve is illustrated on pages 284, 285, and 333. The 
centre of gravity is generally taken as having zero abscissa, and cal- 
culations are made as on page 333. 

The moments of the normal curve of error are readily found by the 
use of partial integration. 

•/ -00 |_ 2 J-00 2 ^ -** 

Now— U tf-^V = 0. ' 

.'. The «th moment of the normal curve = Mn, say — 

o-V27r-^ -~ 
the area being unity, 

Now Mj = and Mg = 0-2. Hence every odd moment is zero, and 
M2t = (r"(2/-i)(2/-.3). . . 3.i=i^^.o^. (Thus ^^^^i^^^) 

In the notation of page 303 let >; = ^i^i + ^2^2 + • • • +^r€r+ • • • 
+ ^m^mj where a^, ^3 . . . are constants. L^t €„ any one of the c's belong 
to, and be measured from the centre of gravity of, a curve of frequency 
of small effective range, and let the effective range and the standard j 

deviation be of the same order as a small quantity 7. Let m be large, 
and let the quantities be so related that m.a^.y^ is finite^ where </ is a j 

finite quantity not differing greatly from «i, a^ . . . ox «". 

The ranges being small, there is no difficulty in supposing that any 
moment of one of the curves, say the /th, is of order -f. This rules out 
abnormal deviations of any element. The contribution of any element, 
€„ to a value of ?/ is to be independent of that of any other element ; and 
the different values obtained from the same (the rih) curve of frequency 
in different values of t] are to be independent. 

We have identically — 

X.I? a,e|X+a,ejx+ . . . 

e =e 



ADDENDA AND CORRIGENDA. 349 

where x is any quantity, merely used as a carrier of the other quantities. 
It is convenient to regard x as small, so that the series used may con- 
verge rapidly. 
Then— 

(i) 

= (l + X^l^l + JjX^«lS^ + • • •)(' + X«2^2 + ^X^ • «2S^ + . . . j . . . 

Suppose a very large number, N, of values of iy to be taken, similar 
equations to be written down for each, and then summed and the 

sum to be divided by N. The left-hand side becomes i + — y' . At2 

+ — X* • i^s + • • • I where /it is the /th moment of the frequency curve 

3- 
of rf, fj^ is zero since all quantities are measured from their centre of 

gravity. 

The right-hand side is then the mean of the products of such 
quantities as in the brackets of equation (i), and this equals (when N is 
large enough) the product of the means of the brackets. (See note, 
p. 350 below.) 

The mean of — 

I + X«r€r + -,X^ . «rV + • • • = I + ;^,X^ • ^'r^ • r/A2 + JiX^ • ^'r® • r/*8 + ' • • > 
2! 2! 31 

where rth is the /th moment for the rth curve and is of order 7 . 
Now take logarithms of the sums. We have — 



log (1+^x^2+ •••) 
= 2^ log (1 + ^jXW . r/*2 ■*■ • • •) 



= 2^xW • r/^ ± terms involving /Aj, ft^*, &c. 
2 

Let 2tf,2 . ^^ — (j.2^ Then by hypothesis cr^.is finite. 

w/Xg, w/ig, mfJL^^ ... are of order my^, my^, my^ , ..,/>., i, /w"*, 
m'^ . . . , since y^ is of order w^, and such a term as S^i,® is of order 
tna^y that is, of order w, since the a's are finite. 

Neglect all terms of order w"*, m'^ . . . , and higher negative orders, 
and we have — 



350 ADDENDA AND CORRIGENDA. 

I + ^,X^ • /*2 + ViXV8+ • • • +7,X"/^+ • • • 

2! 3* ^* 

Equating coefficients fi^t+i = ^ and /% = ^^t^* ^^^ ^^^ integral values of /. 

.'. u. is the rth moment of the normal curve y= — 7=^^ ^^ (see 

(rv27r 

p. 346 above) for all integral values of s. 

Hence this normal curve and the curve of frequency of rj have the 
same moments of all orders. 

Hence rj belongs to a normal curve whose standard deviation is <r, 
where 0-2 is the weighted sum of the squares of the standard deviations 
of €p €2 . . . 

Evidently some slight dependence between the elements can be 
admitted without seriously vitiating the result. 

Notice especially that the proof involves the ignoring of w*, where 
m is the number of elements. If the ^'s were all unity and m= 100, 
7 (which measures the range of the frequency curve of the elements) 
must be comparable with .1. 

If »!"* and m~^ be ignored, it can be shown by the same method 
that we obtain the equation of page 329. 

As an example of the magnitudes involved in this analysis consider 
the sum of 100 quantities taken at random from the natural digits in 
the first decimal place, .0, .1 . . . .9. Here m=ioo, 72 = . 0825 (the 
mean of the squares of the deviations of .0, .i ... .9 from their average 
.45); ^1 = ^2= • • • ~ '• 0-2 = 2.0825 = 100 X. 0825. (r=2.89. 

The sum, whose mean is 45, has for curve of frequency a normal 
curve with standard deviation 2.89, when quantities of the order 
/»"♦ = .! are ignored. 

JVo^e. — If Uy V are any quantities varying in a random way about 
their means, ^, v, and u^ = u-\-d^ z't=^ + ^ti where u^ v^ are any pair of 
values, and if n pairs are taken, n being very great, ^^^Wj • ^t = «^^ 
+ ^28t + vldt + 23^. d^. Here 2St = = 2^^ and 28^^^ tends to zero as n 
increases if there is no correlation between u and v (see p. 319, line 3). 

.-. u .V is the mean value of u ,v, and the mean of the product is 
the product of the means if the quantities are independent. If not, the 
correlation coefficient is involved. 



ADDENDA AND CORRIGENDA. 351 

a 

Similarly the mean of u .v , a/= mean ofu.vx mean ofw^u.vxidf 
where ze/ is a third quantity, and so on for any finite number of factors. 



Correlation, 

The more general conditions under which the normal surface of 
correlation arises (see pp. 316, 317) are obtained by Professor Edge- 
worth (loc, cit^ pp. 116 seq,) by a similar analysis and defined by the 
following hypotheses : — 

In the notation of page 316, take a^^a^^ ... = i =^1 = ^2 • • • ^*°^ 
simplicity, and let the ^'s and the c's have any curves of frequency 
satisfying the conditions described above. Let the ^'s be mutually 
independent and let the c's be mutually independent, but let some of 

the pairs, such as e^^ €^, be related to each other. Let r = — =^ . Then 

it can be shown that the surface of frequency of {x^ y) has the same 
volume, centre of gravity, and double moments of all orders as the 
surface whose equation is stated at the bottom of page 317. 

The outline of the proof is as follows : — 

Let X = ^ cos 0-\-ysmO,Y=—x sin \-yco% 0, where {<t^ — o-g*) tan 2O 
— 2ro-^(r^, Let / = ^ cos ^ + € sin ^, c' = - ^ sin ^ + € cos 6, 

Then, as above — 

where o-^ (t\ are standard deviations of e^ €^, 

It can be shown that if o-,, o-y are standard deviations of x and y — 

(T,^ + o-/ = o-i^ + (^2^ (r,« - (t/ = (o-i" - cr^^) sec 26, (r>/ = o-^Vg^l - f^\ 

and nr^a-^ — ^r^y 

where r^ is the mean value of e^€^. The sum of all possible pairs e\€\ can 
be shown to be zero. 
Then the identity — 

where a and j8 are carriers, treated in a similar way to that on page 349 
above, leads to — 

i+... +a^/3^.r«.^'+ ... =1+ ..."^.(r,X^f ..., 

t\s\ 2*'^* 



where all integral and zero values are to be given in succession to s and 

x*v* 
/, and rts is the mean value of _^ ^ . 



o-^cTy 



Equating coefficients, ^ts = when / or j is odd, and — 



352 ADDENDA AND CORRIGENDA. 

__ (2t)\(2S)\ _ /KyttyVy ^Y 

where /^ i ^ 2V cr « cr •-' 



^ X y 



2Tra'^o'y 



The surface Z is therefore the surface of frequency for x, y. 

Transferring back to x^ y^ and substituting c^ = 20-^*, ^ = 20-^, we 
obtain the equation required. 

The meaning of r is already known and the paragraph beginning on 
page 318 (last line but 5) is superfluous. 

Simpler Proof for the Standard Deviation of a Sum, 

The following analysis is based on papers by various authorities. 

The standard deviation of a sum can be shown to be the square 
root of the sum of the squares of the standard deviations of its parts by 
a simple method; but the very important deduction, which alone 
assigns the probabilities of deviations, that its curve of frequency is 
normal under the conditions defined above, cannot be so obtained. 

Let there be m quantities u^^ «2 • • • ^mi whose average is zero and 
standard deviation (Tp and also n quantities v-^, 2^2 •• • ^n* whose average 
is zero and standard deviation o-g. Let «/ = !*, + z/t be obtained by 
adding any v to any u. Then there are mxn possible and equally 
probable values of w, whose mean is 0. Let o- be the mean square 
deviation for w. 
Then mn<r^ 

= (»i + v^^ + («i + v^^ + (»i + ^s)^ f . . . (« terms) \ 

+ (7/2 + ^1)^ + (^2 + ^2)^+ ... „ \ m lines. 

+ . . . » y 

^n{u^ + u^-{- . . . ^'Uj)^'m{v^-^v^'{- ... + v^) -^ 274^v ■{- 2U^ -^ : , . 

= n . ma-^ + m . na-^ + 22« . 2^;. 

But 2« = = Sz; by hypothesis. 

.-. 0-2 = 0-12 + 0-22. 

Now add a third quantity whose mean square deviation is <r«, and 
we have — 

Square of mean square deviation of sum = 0-2 + 0-32 = a-^ + o-^ + a-^. 
So with n quantities, the mean square deviation of the sum is — 

In this any of the quantities may be subtractive instead of additive. 



ADDENDA AND CORRIGENDA. 353 

There is no assumption in this that any of the magnitudes are small, 
but only that every term as written is as likely to occur as any other 
term ; but it is not assumed that the w's are all different, they may be 
grouped in any curve of frequency. 

Similarly the standard deviation of the sum a-^u + a^v + . . . , where 
a^, ^2 • • • ^^^ constants, is — 

JW^l + ^2^2 +...). 

Accuracy of Averages and the Theory of Error, 

The application of pages 304 and 305 to the formulae of pages 
204-6 is as follows : — 

Page 204, II. If ^p ^2 • • • ^^6 the moduli of the curves of frequency 
to which <?i, ^2 • • • belong respectively, then — 

is the modulus for the error of the average. 

If ^1 = <:2 = . . . = ^n = ^> and if Wt = m + /^t, where m^ is any one of the 
quantities, and nifh = 2w, then the modulus becomes — 

c — ^2(m + /Xt)2 4- nm = c . Jinm^ + 2m2fJL^ + ^fJ^) -r- nm 

since 2/Xt = 0; where 2fi.^^ = no-^^, that is where o-^ is the standard 
deviation of the w's. 

Page 205, IV., and page 206, V. With the same notation the 
modulus for the error of the product or quotient is ^{c^^ -^rc^ , . .), 

Page 205, III. Let m, f^, o-nj have the meanings just given. Let 
w and <r^ be the mean and the standard deviation of the ze/*s, and let 
any w be given by Wt = ^ + ^t> so that !Zd^ = 0. 

The error in the weighted average is — 

(^mw{ I + tf)( I + c) __ ^mw\ _^ ^mw _ ^m^w^^ , 2zc/ 
\ 2z«/(i ■!-€) 2ze/ / * 2a/ 2/«w/ . 27£/(i +€) 

2w , ^mw{ I + €) - ^mw . ^iv{\ + e) 
2;»ze/. 2a/(i +€) 

neglecting the product e€, 



' * 2wze; \^mw 2ze// 



neglecting c^. 

Let c be the modulus for each of the ^'s, and y for each of the c's. 
Let Kj be the modulus for the average when the c's are neglected, and 



354 ADDENDA AND CORRIGENDA. 

Kg when the e^s are neglected. Then the complete modulus for the 
average is — 

The denominator = (nmid -^n. r, o-^o-^)\ where r = —tni^ and is the 



«o-„(r. 



coefficient of correlation between m and w (p. 319). 

The numerator = {nm^i^ + nm^a-^ + «t^(r„*+ 4wi^2ftt^t+ ^'n^\h4x 

Let ri2 = ^^ and r^^^-?^, and let v^^<t^^K d?^fT^^^h\, 
Then— 

since SS^ = = 28't, 

The numerator becomes n(m^id^ + wV^^ + w^o-^^ + 4r»ii«r„ . cr^ 



+ 2ri2Wo-^(r,2 + 2r2iW(rJ<r^ + <r Jo-,^ 4- -SStS't), and 



where </m = ^ and a'^ = ?^. 
m w 






-2^12 



Now let M be the weighted average in question, m the unweighted 
average. 

2a; nw ^ m w/i 

that is, the weighted and unweighted averages are equal if the a/'s and 
m*s are uncorrelated.* 

If the correlation is considerable, that is, if the large weights are to 
be found with the large quantities and vtre versa, or the large and small 
come together, we cannot proceed further. 

r is always numerically less than i, and if the distribution of the m^s 



